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PREFACE TO VOLUME II 


This volume falls into five sections. The first, comprising chapters 17 to 20, deals 
with Estimation. The second, comprising chapters 21, 23, 24 and 26 to 28, covers the 
Theory of Statistical Tests, including the Analysis of Variance and Multivariate Analysis. 
The third, consisting of chapter 22, deals with Regression Analysis and completes the 
account of statistical relationship begun in chapters 13 to 16 of Volume I. In the fourth, 
chapter 25, I have tried to give an introductory account of the reaction of theoretical 
considerations on the Design of Statistical Inquiries. Finally, the fifth, comprising chapters 
29 and 30, deals with, the Analysis of Time-Series. 

The literature of statistical theory is now so vast that it seemed worth while devoting 
considerable space to a bibliography, which is given in Appendix B. Although it is far 
from complete, I hope that it will serve its purpose in guiding the student to the main 
sources. 

The chief problem in the writing of this volume arose in connection with the logic of 
statistical inference. Whenever possible I have kept the treatment objective. It is, 
I consider, unfair in a book of this kind not to present all sides of a case, particularly when 
there is so much disagreement among the authorities. Some day I hope to show, that 
this disagreement is more apparent than real, and that all the existing theories of inference 
ill probability differ essentially only in matters of taste in the choice of postulates. But 
this book is not the jilace for sucli work, and for the present I am content to state the 
position and to leave the reader to exercise his own choice. 

The difficulty became most acute in dealing with confidence intervals and fiducial 
inference, where two approaches which at first sight appear identical can lead to different 
results. Rather than try to reconcile them I have written a separate chapter on each. 
Professor E. S. Pearson was kind enough to read the manuscript of chapter 19 and Professor 
R. A. Fisher that of chapter 20, so that I think their respective views are, at any rate, not 
misrepresented. I am very grateful to them botli for their help in this connection. 

My thanks are also due to Mr. P. A. Moran and Mr. A. J. H. Morrell, who cheerfully 
undertook to help with the proof reading and to whose painstaking scrutiny I owe the 
removal of a number of obscurities and errors. I shall be grateful to any reader who 
detects and notifies me of any further slips which have evaded us. Once again I have also 
to thank the publishers and the printers for the trouble they have taken in the production 
of the finished work. 

M. G. K. 

London, 

April, 194S. 


V 




TABLE OE CONTENTS 


CHAP. 

17. Estimation : Likelihood 

18. Estimation: Miscellaneous Methods ... 

19. Confidence Intervals 

20. Eiducial Inference 

21. Some Common Tests of Significance ... 

22. Eegression 

23. . The Analysis of Variance— (1) 

24. The Analysis of Variance— (2) 

25. The Design of Sampling Inquiries 

26. General Theory of Significance-Tests — (1) 

27. General Theory of Significance-Tests — (2) 

28. Multivariate Analysis 

29. Time-Series — (1) 

30. Time-Series — (2) 

Appendix A: Addenda to Volume I 

Appendix B : Bibliogkaphy 

Index to Volume II 


PAGES 

1-49 

50-61 

62-84 

85-95 

96-140 

141-174 

175-217 

218-246 

247-268 

269-306 

307-327 

328-362 

363-395 

396-439 

440— 44 1 

442-503 

504-521 




CHAPTER 17 


ESTIMATION: LIKELIHOOD 


The Problem 

17.1. On several occasions in previous chapters we have encountered the problem 
of estimating from a sample the values of the parameters of the parent population. We 
have hitherto dealt on somewhat intuitive lines with such questions as arose — ^for example, 
in the theory of large samples we have taken the means and moments of the sample to be 
satisfactory estimates of the corresponding means and moments in the parent. 

We now proceed to study this branch of the subject in more detail. In the earlier 
part of the present chapter we shall examine the sort of criteria which are required of 
a “ good ” estimate and discuss the question whether there exist “ best ” estimates in 
any acceptable sense of the term. In the remainder of the chapter and in Chapter 18 
we shall consider various methods of obtaining estimates with the required properties. 
In Chapters 19 and 20 we shall look at the same problem from a rather different point of 
view and discuss the theories of confidence intervals and fiducial limits. 

17.2. It will be evident that if a sample is not random and nothing precise is known 
about the nature of the bias operating when it was chosen, very little can be inferred from 
it about the parent population. Certain conclusions of a trivial kind are sometimes pos- 
sible — for instance, if we take ten turnips from a pile of 100 and find that they weigh ten 
pounds altogether, the mean weight of turnips in the pile must be greater than one-tenth of 
a pound ; but such information is rarely of value, and estimation based on biassed samples 
remains very much a matter of individual opinion and cannot be reduced to exact and 
objective terms. We shall therefore confine our attention to random samples only. Our 
general problem, in its simplest terms, is then to estimate the value of a parameter in the 
parent from the information given by the sample. In the first instance we consider 
the case when only one parameter is to be estimated. The case of several parameters 
will be discussed later. 

17.3. Let us in the first place consider what we mean by " estimation ”. We know, 

or assume as a working hypothesis, that the parent population is distributed in a form 
which would be completely determinate if we knew the value of some parameter 0. We 
are given a sample of values We require to determine, with the aid of the 

a-’s, a number which can be taken to be the value of 0, or a range of numbers which can 
l)e taken to include that value. 

Now a single sample, considered by itself, may be rather improbable, and any estimate 
based on it may therefore differ considerably from the true value of 0. It appears, 
therefore, that we cannot expect to find any method of estimation which can be guaran- 
teed to give us a close estimate of 0 on every occasion and for every sample. We must 
content ourselves with formulating a rule which will give good results “ in the long run ” 
or “on the average ”, or which has “ a high probability of success ” — phrases which 
express the fundamental fact that we have to regard our method of estimation as generating 
a population of estimates and to assess its merits according to the properties of this 
population. 

A.S. — II 1 B 
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17.4. It will clarify our ideas considerably if we draw a distinction between tlie 
method or rule of estimation, which, following Pitman, we shall call an Estimator, and the 
value to which it gives rise in particular cases, the Estimate. The distinction is the same 
as that between a function / (x), regarded as defined for a range of the variable x, and the 
particular value which the function assumes, say / (a), for a specified value of x equal to a. 
Our problem is not to find estimates, but to find Estimators, We do not reject a method 
because it gives a bad result in a particular case {in the sense that the estimate differs 
materially from the true value). We should only reject it if it gave bad results in tlie long 
run, that is to say, if the population of possible values of the estimator were seriously 
discrepant with the value of 6. The merit of the estimator is judged by the population 
of estimates to which it gives rise. It is itself a random variable and has a distribution 
to which we shall frequently have occasion to refer. 


17.5. In the theory of large samples we have often taken as an estimator of a para- 
meter 6 a statistic t calculated from the sample in exactly the same way as 0 is calculated 
from the population, e.g. the sample-mean is taken as an estimate of the ])arent mean. 
Let us examine how this procedure can be justified. Consider the case when the parent 
population is 


(IF = 


V{2n) 


exp {— I (x — (9)^ } (lx, 


00 X 


00 


(17.1) 


Requiring an estimator for the parent mean 0, we take 




i 1 

The distribution of t is 


that is to say, t is distributed normally about 0 with variance l/-n. W(^ notice^ two things 

about this distribution ; (a) it has a mean (and median and mode) at the tnu' value 0, 
and (6) as n increases, the scatter of possible values of t about 0 becomes smaller, so that 
the probability that a given t differs by more than a fixed amount from 0 (hH-reases. We 
may say that the accuracy of the estimator increases as /?■ increases, or simpl_\' with n. 


17.6. Generally, it will be clear that the phrase “ accuracy increasing with n ” has 
a definite meaning whenever the sampling distribution of t has a variance which decreases 
with l/n and a central value which is either identical with 0 or differs from it by a (juantity 
which also decreases with l/n. Many of the estimators with whicli we are commonly 
concerned are of this type, but there are exceptions. Consider, for cxamjih^ the Cauchy 
population 


(IF 


1 (lx 


00 X -C oo’ 


(17.4) 


The mean (assuming that we conventionally agree that it exists) is at x = 0. But if we 
try to estimate 6 hy the mean-statistic t we have, for the distribution of t, 


dF 


00 


00 


(17.5) 


1 dt 

(Cf. Example 10.1, vol. I, pp. 233-4.) In this case the distribution of t is the same 
as that of any single value of the sample, and does not increase in accuracy as n increases. 
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Consistence 

17.7. The property of possessing increasing accuracy is evidently a very desirable 
one ; and indeed, if the variance of the sampling distribution decreases with increasing 
n it is necessary that its central value should tend to 0, for otherwise the estimator would 
have values differing systematically from the true value and would be useless, not to say 
dangerous. We therefore formulate our first criterion for a suitable estimator as follows ; — 

An estimator computed from a sample of n values, will be said to be a consistent 
estimator of d if, for any positive e and rj, however small, there is some N such that the 
probability that 

\t,,-d\CE . . . . . . (17.6) 

is greater than I — rj for all n > N. In the notation of the theory of probability, 

P {\t,,-d\<e}>l-ri, n>N. . . . (17.7) 

The definition bears an obvious analogy to the definition of convergence in the mathe- 
matical sense. Given any fixed small quantity s we can find a large enough sample number 
such that for all samples over that size the probability that ^ differs from the true value 
by more than s is as near zero as we please, is said to converge in probability to d. Thus 
i is a consistent estimate of Q if it converges to B in probability. 

Example 17.1 

The sample mean is a consistent estimator of the parameter 0 in the population (17.1). 
This we have already established in general argument, but more formally the proof would 
proceed as follows : — 

Suppose we are given e. From (17.3) we see that {t ■— 6) -sjn is distributed normally 
about zero with unit variance. Thus the probability that \ [t ~ B) '^/n \ < e -\/n is the 
value of the normal integral between limits i e-\/n. Given any positive y, we can 
always take n large enough for this quantity to be greater than \ — rj and it will continue 
to be so for any larger n. N may therefore be determined and the inequality (17.7) is 
satisfied. 

Example 17.2 

Suppose we have a statistic whose mean value differs from 6 by order n~^, whose 
variance is of order n'~^ and which tends to normality as n increases. Clearly 
(i5„ — 0)/\/^H, will then tend to zero in probability and will be consistent. This covers 
a great many statistics encountered in practice. 

Unbiassed Estimators 

17.8. The property of consistence is a limiting property, that is to say, it concerns 
the behaviour of an estimator as the sample number tends to infinity. It requires nothing 
of the behaviour for finite n, and if there exists one consistent estimator t^^ we may construct 
infinitely many others ; e.g. 

n — a 
n^b 

is also consistent. We have seen that in some circumstances a consistent estimator of the 
mean is the sample mean 

- 1 r 

X = - L Xj. 

n 


(17.8) 
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But so is 

x' ^ — I — i:x,- (17.9) 

n — I 


Why do we prefer one to the other ? Intuitively it seems absurd to divide the sum of 
n quantities by anything other than their number n. We shall see in a moment, however, 
that intuition is not a very reliable guide bn such matters. There are reasons for preferring 



3 = 1 


. (17.10) 


1 ” 

to (17.11) 

72f 

3=^1 

as an estimator of the parent variance, notwithstanding that the latter is the sample 
variance. 


17.9. Consider the sampling distribution of an estimator t. If the estimator is 
consistent, its distribution must, for large samples, have a central value in the neighbour- 
hood of 0. We may choose among the field of consistent estimators by requiring that 
6 shall be equated to this central value not merely for large, but for all samples. Whether 
we choose as the appropriate central value the mean, the median or the mode is to some 
extent a matter of taste. We shall consider below what follows if we select tlie mode 
(which gives us the maximum likelihood estimators). For the present we discuss tlie mean. 

If we require that for all n the mean value of t shall be 0, we define what is known as 
an unbiassed estimator : 

E (t) =0 (17.12) 

This is an unfortunate word, like so many in statistics. There is nothing except con- 
venience to exalt the arithmetic mean above other measures of location as a criterion of 
bias. We might equally well have chosen the mode as determining the “ unbiassed ” 
estimator, in which case the mean estimator would be ‘‘ biassed ” whenever it gave a dif- 
ferent result. Since the use of “ unbiassed ” in connection with the mean is fairly wide- 
spread, however, we shall continue to use it.* 


Example 17.3 
Since 


E 2;(a:) 
\n 


- -i' [E (x ) } 

71 


-Z/.q 


the mean-statistic is an unbiassed estimator of the parent mean whenever the latter exists. 
But the sample-variance is not an unbiassed estimator of the parent variance. We have 

cr - 1 A- (a:) 

71 

= n |!LrJ; 2- (*“) - i 2 xA, j^-k 

n 71 ) 

* = — 1 ) _ 1 ) ^^2 

= (3^. — 1) /tg. 

* The word has already occurred in vol. I, p. 200, in this sense. It naay be spelt with either one 
or two s’s. My usage, I am afraid, is not consistent, but in this volume I use two. 


E 


x)’-} = E 
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Tims - E (x — x)^ has a mean value ita. On the other hand, an unbiassed estimator 

n n 

is given by 

— L- 

n — 1 

and for this reason it is sometimes preferred to the sample variance. There are other 
reasons which will appear when we come to study the analysis of variance. 

Efficient Estimators 

17.10. In general there will exist more than one consistent estimator of a parameter, 
even if we confine ourselves only to unbiassed estimators. Consider once again the esti- 
mation of the mean of a normal population with known variance. The sample mean is 
consistent and unbiassed. We will now prove that the same is true of the median. 

Consideration of symmetry is enough to show that the median is an unbiassed estimate 
of the parent mean, which is, of course, the same as the parent median. For large n the 
distribution of the median tends to the normal form (cf. Example 9.7, vol. I, p. 213), 

dF oc exp { — 2nfl {x — 0)^} dx ... . (17.13) 

where /i is the median ordinate of the parent, in our present case l/-\/{27i) = 0-3989. The 
variance tends to zero and the estimator is consistent. Its variance is 7c/2n. 

17.11. We are therefore at liberty to seek for further criteria to choose between 
estimators with the common property of consistence. Such a criterion arises naturally 
if we consider the sampling variances of the estimators. Generally speaking, the estimator 
with the smaller variance will be grouped more closely round the value 6 ; this will certainly 
be so for distributions of the normal type. An estimator with a smaller variance will 
therefore deviate less, on the average, from the true value than one with a larger variance. 
Hence we may reasonably regard it as better or more efficient. 

If, of two consistent estimators ti and we have var tx < var for all n, then is 
more efficient than 4 for all sample sizes. It is possible to have var < var for some 
ranges of n and var ti var t^ for others, in which case the estimators are more or less 
efficient in different ranges. 

In the case of mean and median we have, for any n, 

var (mean) = — , ..... (17.14) 

n 

and for large n 

var (median) = ..... (17.15) 

where is the parent vai’iance. Since 7 i /2 = 1-57 > 1 the mean is more efficient than 
the median for large n at least. For small n we have to work out the variance of the median. 
The following values may be obtained from those given in Table XXIII of Tables for 
Statisticians and Biometricians, Part II - 

n 2 3 4 5 

var (median) 100 1-35 1-19 1-44 

It appears that the mean is always more efficient than the median in estimating the para- 
meter 6 for the normal distribution (17.1). 



6 


ESTIMATION: LIKELIHOOD 


Example 17.4 

Eor the Cauchy distribution 


dF =- 


00 < a; < 00 


1 dx 

:7r 1 {x — 6)^’ 

we have already seen that the sample mean is not a consistent estimator, 
the median in large samples we have, since the median ordinate is \/n. 


var (median) 



However, for 


It is seen that the median is consistent, and although direct comparison with the mean 
is not possible because the latter does not possess a sampling variance, the median is evi- 
dently a better estimator for 0 than the mean. This provides an interesting contrast with 
the case of the normal parent, particularly in view of the similarity of the parent frequency- 
distributions. 


17 . 12 . In some cases, as we shall see below, there exist consistent estimators whose 
sampling variance for large samples is less than that of any other such estimator. We 
shall call such estimators most-efficient. When they exist they provide a standard of 
measurement of efficiency. In fact, if ^2 has variance v.^ and the most-effcient estimator 
ti has variance Vi, the efficiency E of is defined as 

JB' = ^ (17.16) 

^2 


It will be seen later that in normal samples the mean is a most-efficient estimator, so that 
the efficiency of the median for such samples is 


2n 

71 


1 = 0 - 637 . 

n 


17 . 13 . If we have a sample of 100 members the variance of the median (assuming 
normality) will be about the same as that of the mean in only 64 members. Thus, if 
sampling variance be accepted as a criterion of accuracy of estimation, the use of the median 
instead of the mean sacrifices about 36 observations in 100. It is not possible to economise 
by using a different estimator than the mean. 

Other things being equal, the estimator with the greater efficiency is undoubtedly 
the one to use. But sometimes other things are not equal. It may, and does, happen 
that a most-efficient estimate derived from ti is more troublesome to calculate than an 
alternative The extra labour involved in calculation may be greater than the saving 
in dealing with a smaller sample number, particularly if there are plenty of further 
observations to hand. 


Example 17.5 

Consider the estimation of the standard deviation of a normal population with variance 
and unknown mean. Two possible estimators are the standard deviation of the sample 
(or the square-root ol S {x — x)^l{n — 1) if it is desired to use an unbiassed estimator) 
and the mean deviation of the sample multiplied by ■\/{ 7 i/ 2 ) (cf. 5 . 20 ). The latter is 
easier to calculate, as a rule, and if we have plenty of observations (as, for example, if we 
are finding the standard deviation of a set of barometric records and the addition of further 
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members to the sample is merely a matter of turning up more records) it may be worth 
while estimating from the mean-deviation rather than from the standard deviation. 

In normal samples the variance of the mean-deviation is (9.13) 

^ ^ g + V - 2)} - » + arc sin ~ ^ (l _ J . (17.17) 

The variance of the estimator from the mean deviation is then approximately 



(17.18) 


Now the variance of the standard deviation is (9.22) and we shall see later that it 

is a most-eiBS-cient estimator. Thus the efficiency of the first estimator is 



— = 0*876. 
7t — 2 


The accuracy of the estimate from the mean deviation of a sample of 1000 is then about 
the same as that from the standard deviation of a sample of 876. If it is easier to calculate 
them.d. of 1000 observations than the s.d. of 876 and there is no shortage of observations, 
it may be more convenient to use the former. 

It has to be remembered, nevertheless, that in adopting such a procedure we are 
deliberately wasting information. By taking greater pains we could improve the ef&ciency 
of our estimate from 0*876 to unity, or by about 14 per cent, of the former value. 


Sufficient Estimators 

17.14. The comparison of the efidciencies of two estimators, as measured by their 
variances, may be made for any n, but the absolute efficiency as defined in 17.12 by relation 
to a most-efficient estimator is in the main a limiting property. We shall see below (17.36) 
that the definition may be extended to small samples and to non-normal variation, but 
most-efficient estimators for finite n do not exist so frequently in statistical practice 
as in the limiting case of large samples. Sometimes, however, there are estimators which 
may be regarded as the “ best ” for samples of any size, and we proceed to consider 
them. 

Before doing so, we prove that, in the limit, all most-efficient estimators tend to 
equivalence. 

More precisely, if two most-efficient estimators ti and tend in the limit to be dis- 
tributed in the bivariate form 


dF X exp 


__ 0)2 „ 2p {t, - 0) ih - 0 ) 4 - {k - d)^} 


2v{l - p^) 

then the correlation p = 1. Here v is the variance of each estimator. 
Consider the estimator 

^ (^1 "h ^ 2 )- 


dWdU, . (17.19) 


Clearly u-i is consistent since t-^ and are both so. Putting 

'^2 = i" (^1 ^2) 

we have, for the joint distribution of Ux and u^. 


dF <x exp 


L (1 - 


- {2 (1 — p) (ux — 0)^ + 2 (1 p) ul] 


dux du-i. 


(17.20) 
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Thus Uz is distributed independently of Ui and 0 and we have 

V {1 — p^) 1 -|- p 

var U-, = — ' = — V. 

2 (1 - p) 2 

Now ti is a most-efficient estimator and hence 


. (17.21) 


giving 


V = var Ml > var t-^ — v 

Zi 

1 + P 1 


(17.22) 


But p cannot be greater than unity and hence p = 1, which proves the theorem. 

17.15. Consider once again the estimation of $ in the normal population (17.1). 
The joint distribution of the sample is given by 


dF = 


We have the familiar result 


(% — 0)4 dx^ 


. (17.23) 


and hence 


fi- 

^ (xj — 0)- = i7(x — x)^ -j- n (x — 0)^ 

/=i 

- exp {x — 0)~\ exp {— -1 S {x — .r)^} dx^^ 


dx,, . (17.24) 


Thus the frequency function of the distribution of (which is equivalent to the likelihood 
function) can be factorised into two parts, one depending on x and 0, the other depending 
on the x’s but not on 0. 

The quantity x is then said to be a sufficient estimator of 0 ; and generally, if the 
likelihood function is expressible in the form (as a product of two frequency functions) — 

L {Xi, . . . x^, d) = Li {t, 6) Lz {xi, . . . x^), . . (17.25) 

where Lt does not contain the aj’s otherwise than in the form t and is independent of 0, 
t is said to be a sufficient estimator of 0. 


17.16. As so defined, a sufficient estimator, if it exists at all, is unique except that 
if t obeys the relation (17.25) any function of t will obviously also obey the same relation. 
From all such functions we must evidently choose one which gives a consistent estimator 
and can sometimes, as in the example of the previous section, find the estimator which is 
unbiassed. Apart from such ambiguities, which offer no difficulties in practice, the property 
of uniqueness holds. For if ti and were two different sufficient statistics, not functionally 
related, we should have — 


Li (^l, 0) Lz {xi, . . . a: J ~ {tz, 0) Mz {xi, . . . x,,), 

and hence 

Ll ( ^1, 0) ^ ^2 
(^2 j 0) dhz 


. (17.26) 


Since the expression on the right does not contain 0, Li must be a factor of and more- 
over the quotient must be a constant ; for if it were a function of the x’s that function 
would have been assimilated to Lz or Jfg. 



SUFFICIENT ESTIMATORS 


9 


Hence 


-^1 (^is 6) — h JM[x (^2} ^)j 


and tins cannot be so unless and are functionally related. 


17.17. The fundamental property of sufficient estimators derives from the following 
theorem : — 

If tx is sufficient and is any other estimator of d (not a function of tx) the joint dis- 
tribution of tx and tz may be put in the form 


dF — fx {tx, Q) fz (pz) ^i) dtxdt^, .... (17.27) 

where /a does not contain B. Conversely, if (17.27) holds for every then tx is sufficient. 

Before proving this result let us notice its importance. From (17.27) it follows that 
for any given tx the distribution of t^ is equal to /g {t^, tx) dt^, i.e. is independent of 6. Con- 
sequently, if we know tx, the probability of any range of values of t^ is the same for all B. 
The distribution of t^ given tx, therefore, can throw no light whatever on B. Thus, a know- 
ledge of tx gives all the information that the sample can supply about B and no other 
estimator can add anything to it. We are clearly justified in such circumstances in 
describing a sufficient estimator as the “ best 

Now as to the theorem itself. The direct part is easily proved. In fact, we have from 
(17.25)— 

(^ 1 ? ... B) docx ... doc^ — d^x (px, ^) 1^2 (^ 1 ? ... ^/i) doox ... doc^* 


Make the transformation 


Vx = tx {xx, . . . a?,,)' 

2/2 = «2 (Xx, . . . xj 

Vz =Xs ^ 


. (17.28) 


Vn -- 

The element of frequency becomes 

Fx iPx, B) L<2, (Xx, 


x„ 


.3 (Xx, X2) 7 

a (tx, h) 




. (17.29) 


where the i’s and x’s are to be expressed in terms of the y’s. We have excluded the case 
when ^a is functionally related to ^i, and hence the Jacobian d (Xi, X 2 ) /d(tx, tn) does not 
vanish identically. The frequency element of yx and 2/2 is then obtained from (17.29) by 
integrating out the other variables. Since 2/1 and 2/2 are equal respectively to tx and t^ 
this process will leave unchanged the function Lx (tx, 0) and reduce the other part to a 
function of tx and t^, say /a (^i, t^). Writing fx for Lx we then have 


dF — fx(tx, B) fz (tx, tfjdtxdt^, 

as stated in the theorem. 

The converse is a little more difficult. Let tx be sufficient and make the transformation 
2/1 = tx, 2/2 == X 2 , etc. The joint distribution of sample values becomes 


L (Xx, 


• • x^x,) L (tx, 2 / 2 j • - ■ yTi) 


dtx 

dXx 


. (17.30) 


Since tx is independent of B, so is dtx/dxx. Hence, if the distribution of tx is / (tx) dtx, L' may 
be written 

f (tx) L" (tx, y^, . . . yj, . . . . (17.31) 

and the converse will be established if we can show that L" does not contain B. This we 
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do by demonstrating that if there are values y'^ ... for which L" assumes different 
values for Merent values of Q then the joint distribution of and cannot be independent 
of d, which contradicts our hypothesis. 

Suppose, then, that for two values of 6, say 6^ and 02, 

. . . y'n\=L'' -f 2a, . . (17.32) 

where a is not zero. Consider a new statistic defined by 


w 

*1 = A' ~ 




. (17.33) 


Assuming that L" is continuous in the y’s, we may determine a value of say 4, such that 

L" {ti,y^, . . . yr,)6^> L" {ti,y^, . . . + a . . (17.34) 

everywhere inside the range of values bounded by 

h'- =E(,y - y')\ 

Then for any fixed h the total frequency inside this range is obtained by integrating L" 
over the appropriate values, and we shall find, in virtue of (17.34), 

4.U ^ • /e, ...... (17.35) 

the / s referring to total frequencies. " ^ 

But if the joint distribution of and is 

dF («i, ^2)0 dti dt., 

we have for the frequencies /, 


and hence 


r4 

f <h ~ I *^2)0, dt-i 

J 0 

r<3 

f 0 , ~ I h (ti, tojg dt^ 

J 0 

r<3 

j (fl, ^2)0^ — h (f^, t2)o,}dts: > 0 , 

V 0 


so that the joint distribution cannot be independent of 0. 

to the case when the frequency functions are con- 

t I .u disoontmuous case the argument simplifles and we leave it to the reader 

to supply tixe proof. 

important further result to the effect that a sufficient 

f W provided that a most-efficient estimator exists. We assume 

sufficient estimator t, and any other estimator U tends 
to normality for large n, say in the form 


dF cc exp 


(‘t - _ 2p (h - S) (1, - fl) (I, - S)^ 

V(«^l ^2) Vo 


dt^dt^ . (17.36) 


2(l-p2) 

tAhZZo jari^ces of t, and t, respectively. Since «. is sufficient, the dis- 

tribution of given does not contain 0. Now the distribution of is 


dF oc exp j— ^ I dtj. 


. (17.37) 
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md Fence that of given is 

1 


dF cc exp . 

^ 2 (1 

which reduces to 

dF oc exp 


(^1 - ey _ 2p {t, - d) jt, - 6) {t, - 0)2' 


+ i 


ih - o r 

j 


dtz 


p ih (^2 ~ ^)1 


2(l-p2)|^ 

If this is not to involve d we must have 




di» 


. (17.38) 




= ^/F, where E is the efficiency 


of «3. 


(17.39) 


" Since p < 1 it follows that Vi < v^, i.e. ti has a smaller variance than any other estimator. 
■Consequently, if there exists a most-efficient statistic, ti itself is most-efficient. 


17.19. The criterion of sufficiency is not a limiting property. A sufficient estimator 
is best for any sample size since it gives all the information about d that the sample can 
give ; and it is most-efficient for large samples. If we could always find a sufficient 
estimator our problem would be solved, but unfortunately sufficiency is the exception 
rather than the rule. 


Example 17.6 

The frequency element of a sample of n from the population 


dF ^ exp 


{x — m)‘ 


dx 


■can be put in tlie form 
\/n 


dF = 


a\/{27t) 


exp 


n (.? — w)^ 
"2 o'^ 


— I 

n 

w— 1 


MS ” 


e s'^~^ dx ds^ 


( 2 ( t 2 ) 2 r{ - 


n 


(Cf. Example 10.5, vol. I, p. 238.) 

If we know a, then, as we have already seen, x is sufficient for m. But if we know 
m, s is not sufficient for a. In fact, the factorisation in the above equation requires the 
appearance of a in the element relating to x, and we cannot separate a factor containing 
^ and <7 alone or the remaining variables alone. 

• This is what we might expect. If we know the real mean m there is little point in 
preferring the sample variance 

" ^ Z{x ~ xf 


to the .second moment 


n 


n 


E {x — w)2 


as an estimator of the parent variance. The distribution of s' is given by 


dF 


n 


ns 




e ‘ 2c » ( s ')'«'-2 
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and this embodies the whole of the frequency element of the sample, apart from differentials 
in the other variables. Thus s' is sufficient for <r. 

17.20. This completes the first stage of our inquiry. The criteria of consistence, 
efficiency and sufficiency provide standards which we shall look for in “ good ” estimators. 
Of themselves, however, they do not provide any systematic way of deriving estimators 
which obey them. We shall now consider various methods which have been proposed for 
providing estimators and examine how far they conform to our criteria. The most 
important method is that of maximum likelihood, which will occupy the remainder of this 
chapter. In the next chapter we shall consider four others, the method of minimum 
variance, the method of minimum ^j^e method of least squares, and the method of 
inverse probability. 


Maximum Likelihood 

17.21. If the frequency function of the parent population is f {x,0), the likelihood 
frmction of a sample of n is, by definition. 


L = f (Xi, d) f {x<i, 6) . . . . . ( 17 . 40 ) 


The Principle of Maximum (or Maximal) Likelihood then states that if there exists a statistic 
t =t {Xx, . . . x^ which maximises L for variations of d, then t is to be taken as an 
estimator of 6. In short, t is the solution (if any) of 


dL 

do 


0 , 


w 


< 0 . 


. ( 17 . 41 ) 


Since L is positive, the first equation is equivalent to 

' ' ‘ ■ ■ 

a form which is frequently more convenient. 

There is one small point to notice here. In our usual convention, if a frequency 
function has a finite range, we regard it as defined from — oo to + oo but as zero outside 
that range. In this chapter we shall occasionally meet the reciprocal off, which is undefined 
for zero /. Unless the contrary is specified we shall suppose that where / is zero 1 // is also 
to be regarded as zero. This will enable us to continue to regard the range as infinite, but 
some care is necessary where / is assumed everywhere continuous, for discontinuities may 
appear in / and l/f at the terminals of the finite range. The point becomes important 
when we try to make certain existence theorems rigorous. 


17.22. In sections 7.27 to 7.31 we touched on the principle of maximum likelihood 
from the point of view of statistical logic. We pointed out that its adoption required a 
new postulate in the theory of inference, but referred to the fact that the princijfie was 
recommended by the statistical properties of the estimators to which it leads. We now 
proceed to prove a series of theorems about these estimators, from which it will be seen 
that the posterior recommendation, so to speak, is very strong. In fact, maximum 
likelihood estimators are consistent, tend to normality for large n, have minimum variance 
in the limit at least, and provide sufficient statistics where such exist. 


17 .23 . The reader may feel convinced intuitively that maximum likelihood estimators 
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are consistent, in which case he can pass to the next section. We shall now prove the 
result formally. 

{a) If the frequency function f {x, 6) is continuous in x throughout its range, and 

(6) if / {x, 6) is continuous and monotonie in 6 in some 0-interval containing the true 
value of 0, say 0o> and for aU x in some a;-iaterval, 
then the maximum likelihood estimator of 6, say t, is consistent. 

Our proof will also cover the case of discontinuous variates which can be reduced to 
the continuous case by replacing each value by an interval in which the frequency is 
uniformly distributed. 

We first eliminate an inconvenience due to the infinitude of the range. In fact, if the 
range is infinite we make the variate transformation x = tan y. The conditions (a) and (6) 
remain true of y, and the maximum likelihood estimator in x transforms to that in y. We 
may therefore take the range as finite. 

The next step is to reduce the case to one of grouped frequencies by dividing the range 
into m intervals, the width of the jth interval being Ij. (We shall decide on the actual 
values of the Z’s below.) Writing 

f, = [ij (X, 6) dx, .... .(17.43) 

we have, in virtue of the continuity of / in x, that fj/lj differs as httle as we please from 
J{xp B). Then if L' is the likelihood of the grouped data, proportional to 

IJ * 



(lA 
■ \lj 


(17.44) 


where xi is the number of observations in the yth interval, we have, except for constants, 


m 

log L' log/,. 


'm 

Z 

y-1 


n. log h 


(17.45) 


and this will differ arbitrarily little from the logarithm of the true likelihood 


^ (17.46y 

J -.1 

jwovided that w'e take ni large enough and the Z’s in consequence small enough. 

Hence w^e see that if t is the estimator which maximises L and t' that which maximises 
L', in virtue of hypothesis (6) that L and L' are continuous in 0, t and t' will differ as little 
as we please for any given values of the rr’s and that uniformly. We may therefore prove 
our theorem for the finite number of variables Uj and infer its truth for the continuous 
case by proceeding to the limit. 

In different samples the will vary, subject only to the condition that 2J (nj) = n. 
Let us choose the ranges Ij such that /,- (Oo) = l/m for all j, that is to say, such that the 
frequencies in all intervals are equal when 0 takes its true value 00- Consider the likelihood 
function 

m 

K = ^ Tij log Zj, ..... (17.47) 
w'here the s’s are subject only to the condition 


Z{z) = 1. 


. (17.48) 
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We consider three values of K defined by particular values of the 2 :’s. 

{a) When — n^/n, K is a maximum, say K^. For we have 

6K dz. 

% 

0 — E dzj, 

and hence 

^2 E (?i) 

“ "52 “ * ■ ’ “ 

(b) When z^ =/,.(0o) = 1/m, K is, say, 

(c) When the estimator t' assumes the value, say, corresponding to the n/s, and 

ence Zj = {Iq), K is a maximum, say K^, among the particular set of values of Q for 

which Zj = fj (d) ; for this is our definition of f. 

We have at once that 

Kn > Kz> Kj^j ( 17 . 49 ) 

Now, as the sample increases, the observed n^/n converge in probability to their 
theoretical values ((9o) = 1/m. Since K is continuous in the z’s, Kj. — J{.. will converge 
to zero in probability and, from (17.49), so will K-^ ~ ‘ 

Now we show that this entails that each of 


converges to zero in probability. In fact, since | (0„) — 

to prove that the same holds for 


??-• 

y 

n 


fi (Q 


n. 


does so, it will be enough 

. (17.50) 


Let be the maximum of K for some fixed Zj. Then Kj^ > and 

Kji ~ Kjif > — Kjyj. 

Hence Ki — Kj^j converges to zero. The maximum is readily seen to be given by 

^ % (1 - 2:1) 

n — 




. 5 m 


( 17 , 51 ) 


ill = %! log -f (n - {log (1 - z^) - log (n - n^)} + 2^ Uj log nj. (17.52) 

Now Zi is a double- valued function of Ki, continuous and having its two values equal 

for Ki == Kjj ; for is continuous in Zi from 0 to 1 (not inclusive), and changes sign 

only for Zi = where It follows that when Kj, - K, is small, so is 

21 - n^/n. If the other z s are not given by (17.51) - X is smaller still. 

I 

converges to zero in proba- 


A similar argument applies for any j, and hence 

wh^ Xjf^ X ^o®s so. Taking z^ = fj- (^q) and remembering that in this case X 
becomes X^, we reach (17.50). 

Finally, by hypotheses (a) and (6) at least some of the /• (0) have continuous inverse 
functions expressing 6 in terms of the functions /, and hence by taking 

ll'it'o) -/i(eo)I 
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as small as we please, we may make 4 — 6o as small as we please. Consequently f con- 
verges to 00 in probability and is consistent. 


17.24. The reader may find the foregoing proof easier to follow if we express its 
main points in geometrical terminology. 

Consider the m proportions nj/n as the co-ordinates of a point in a space of m 
dimensions. The theoretical frequencies 
fj(do) — l/m define a point, say M, in 
this space, and the sample point H, cor- 
responding to an observed set of n/s, may 
be regarded as varying round the “ theo- 
retical ” point M. The quantities z are 
the co-ordinates of any point in the hyper- 
plane 2 (z) — 1, which contains M and R. 

(See Fig. 17.1.) 

Now, for any sample point R the 
maximum likelihood estimator t' assumes 
a value 4 which in general differs from 
00* This value defines m quantities fj ( 4 ) 
which determine a point Z. This also 
lies in the hyperplane since the sum of 

the frequencies is unity. Thus the points R determine a set of points Z which all lie on 
the curve defined for variations in 0 by 

z,=f,{0) (17.53) 



Fig. 17.1. 


Since 0 = Oo is a possible value of 0, the point M lies on this curve ; J? in general does 
not. 

What we have shown in analytical form is that the function K, which is the logarithm 
of a likelihood function defined for any point, on the hyperplane, has a maximum at R 
and a maximum on the curve itself at Z. As the sample size increases, R is as near as 
we like to M (in the sense of convergence in probability, that is to say, that as high a pro- 
portion of points R as we like are as near as we like to M). This involves that Z also is as 
near as we like to M. This in turn involves that the parameter-value corresponding to 
Z is as close as we like to 0„ for as high a proportion of the possible points Z as we like, 
which is oiir theorem. 


17.25. We now prove a second fundamental property of maximum likelihood 
estimators, namely that they tend to normality for large n. More precisely, 

(a) If condition (a) at the beginning of 17.23 is satisfied ; and if (more stringently 
than condition (6) of that section) (c) in a 0-interval containing the true value 0o, 

is continuous in 0 for every x, approaches a continuous function of 0 as a? 

tends to infinity, and does not vanish in some interval, 
then the maximum likelihood estimator t tends to normality for large n. The condition 
as to ensures that in the transformation to finite range remains continuous in 0 

throughout that range. 
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We recall that if 



m 


(17.54) 


that is, if the |’s are the deviations of the actual proportional frequencies n^/n from the 
“ expected ” frequencies 1/m, the distribution of the |’s in the limit will be normal and their 
distribution spherically symmetric. Consider again the orthogonal space of the previous 
section. The sample points are distributed about the point in a symmetrical form which 
tends to normality. If we choose a set of orthogonal axes in the hyperplane, the projection 
of the sample points on any axis is in the limit distributed normally with variance l/mn. 

In the neighbourhood of M the curve (17.53) ‘approaches its tangent line as n becomes 
larger, and we therefore have, if s is the distance along the tangent from M, 


= j .... (17.66) 

as follows from (17.53). (The tangent exists in virtue of our hypothesis as to the differential 
coefficients of / in 6.) 

Now consider the point Z on the curve corresponding to the sample point R. We 
know that at Z the function 

. (17.56) 

where we now measure z from M, is a maximum for variations in z such that Z lies on 
the curve. R is determined by finding the hypersurface (17.56) tangent to the hyper- 
plane S (Zj) = 0, for at that point dKjdZj is zero. We know that the co-ordinates of 
this point are = n^/n — 1/m and that R is the point of tangency. Kj^ as defined in 
17.23 is the value of K ati?, and We then have, by Taylor’s theorem, 



K 


z 


K 


'dK\ 






d^K 


dZ.; dZr. 

J 


J J 9 

to the second order of small quantities in bz. From (17.56) we see that 

dK 


(17.57) 


dZj 


d^K 
dz. dzj. 


n 


0 , 


n- 

n. 


j 

j = k 


(17.58) 


(17.59) 


Hence 


n Z (dZj) 


{dz^y 

2 ^ 


n„- 


(17.60) 


Now Z (dz^) — 0, for the variation takes place in the hyperplane. Hence, for given R, 

Z is the point for which Z- — ^ is a minimum. As n tends to infinity the n/s tend to 

equality, and hence Z is the point on the curve which is nearest to R. Thus R is, in the 
limit, projected orthogonally on to the curve, that is to say, in the limit, on the tangent 
line. 

Now we know that these points are distributed normally with variance 1/mn and 
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this proves the theorem. 
estimator ; for 


We may also evaluate the variance of the maximum likelihood 


YShVt' = 


var s 


S "”1 




. (17.61) 


and since t' approaches t for fine grouping we have also, remembering that 1/m = /^ (0o), 



. (17.62) 


where 0 is to be put equal to 0o on the right. 

It may be remarked that condition (c) at the beginning of the section prevents the 

vanishing of ^ which might render the expression (17.61) nugatory. 


17.26. We have, then, under the afore-mentioned conditions, 

1 „ /a log/' 


var t 


n E 


^aiog/y 


9/ 


If the range is independent of 0, or if / and ^ vanish at any extremity of the range which 
depends on 0, we have the alternative form— 


var t 


n 


■ 


. (17.63) 


rb 

In fact, since f dx = 1 where a, b are the limits of the range and may contain 0 
J a 


y we 


have * 




da 


i/H-')"' 


Differentiating again, we have 


0 


IK-?)’"'- ■ '■’« 


3 /*' 

Again, if the range is independent of 0 or if ( ^ ) vanishes at the extremity, the last two 


* The operation of differentiating under the integral sign requires certain conditions as to imiform 
convergence, even when the limits are independent of 0. To avoid prolixity we shall always assume 
that the conditions hold unless the contrary is stated. The point gives rise to no statistical difficulty 
but is troublesome when one is aiming at complete mathematical rigour. 

A.S. — 11 


C 
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terms on the right in (17.64) are zero, and we have (reverting to our usual convention as 
to limits) 




^OO / 

J --CO \ 


a log/ 


/da: 


and the result follows from (17.62). 


17.27. We now prove a third fundamental property concerning the efficiency of 
niaximum likelihood estimates. 

If i be any estimator of 6, the range of / (x, 6) is independent of 6, and in large samples 
t is distributed normally about mean do (the true value of d) with variance v ; then 

~ cannot exceed f f dx, with. 6 = Bo ; 


i: a- 


f dx, with 6 = Bo; 


and hence, if a maximum likelihood estimator exists, it is most-efficient in the class of 
such estimators. 

By hypothesis, we have in the limit for the frequency function of t, 


and hence 


0 = — exp 

Vi^nv) ^ 


d ^ log 0 


{t - 0)2 
'"2v 


. (17.65) 


. (17.66) 


where, for convenience, we drop the suffix of B until the end of the proof. We then have 




Now consider 


1 /a^\2 


0 V dB 


(logL) . 


(17.67) 


. (17.68) 


(17.69) 


as a random variable over the possible values x^ . . . x.^ conditioned by t = constant. 
Since the frequency of u is L, we have 

• ( 17 - 69 ) 

with summation (or integration) over the range of x’s. Now 0 is the frequency of all 
samples having a constant t, and hence 

0 = Z{L). 

Hence 

2’(Lw2) {i:{Lu)}^ 

0 02 


var u 


i V fi (^Y\ _ 1 fr 

<p \L\de / j 


§)r 


Now var u cannot be negative and 0 is not negative, and hence 


1(?LY 

lI bb) 


> 0 . 


. (17.70) 


. (17.71) 
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But 



dd’ 


and hence, substituting in (17.71) and integrating over all t, we have 



. (17.72) 


Now U is carried out over all x for constant t and the integration over all t, so that the two 
summations together are equivalent to summation over the x’s without restriction. Hence 



which establishes the result, since the expression on the right is the reciprocal of the variance 
of the maximum likelihood estimator, if it exists. 


17.28. The fourth fundamental theorem of maximum likelihood estimators is as 
follows : — 

If a sufficient estimator exists, it is a function of the maximum likelihood estimator. 
In fact, the likelihood can then be put in the form 

L = li-^ (^, 0 ) L 2 {Xx • - - ^n)’ 

where does not contain 0. Hence 

log L = A log Lx 

ae do^ 

= rp {0, t), a function of 0 and t only. . (17.73) 

Hence, for fixed t, — log L is constant, and it follows from the previous section that the 

variance of t is equal to the variance of a most-efficient estimator (for var u is then zero 
for fixed t and the inequality (17.72) becomes an equality). Hence the sufficient estimator 
is most-efficient, confirming the result of 17.18. 

It follows from (17.73) that the maximum likelihood estimator is given by 

-ip {0, t) = 0, (17.74) 

which proves the theorem. 

Conversely, if t is such that (17.73) is true, it must be sufficient ; for then we have 

log L = C 4- J •yj (0, ^) dd, 

where C does not depend on 0 and the likelihood is of the requisite form. 


Example 17.7 

Consider the estimation of the parameter m in the population 


dF 


a^/{27l) 


exp 


1 /'x — 

2 / 


dx, 


00 < a: < 00 
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where a is known. The frequency function is easily seen to obey the conditions relating 
to maximum likelihood estimators. We have 

log X = - log a 'v/(2^) ~ 

j=i 

and hence the maximum likelihood estimator is the root of 

^logi = iz(*-m) = 0, 

giving ^ =,1 27(a;) 

n 

It is frequently convenient to denote the estimator of a parameter by writing a cir- 
cumflex accent over it in this way. 

In this case the sample mean is the maximum likelihood estimator. It is therefore 
most-efficient and no other estimator can have a smaller variance in the limit. For the 
variance we have, from (17.63), 



giving the familiar result — 

var X = — . 
n 

T^his, as it happens, is true for any n. The estimator is also sufficient, for 


- — log L — — inx — nm) 
dm 

~ a function of m and x only. 

The condition that is known is to be noted. Complications arise when two parameters 
are estimated- simultaneously, as we shall see presently. 


Exaviple 17.8 

Consider the estimation of B in the Type III distribution 


whei*e p is known. 
We have 


dF 


Q-X/t) 



log/ = (p - l)loga: - I 


0 < a: < 00 


log r{p) — p log e 


and hence, dropping terms independent of 6, 

log X = — i X (a?) — np log B. 
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The equation of maximum likelihood is then 


— X (x) — ^ = 0, 


giving 


Z {x) __x 
np p 


The variance is given, by (17.63), as 


var 6 




p 2p 


var ^ 


where 6 is the true value of the parameter. We could also have obtained this result directly 
(and again it happens to be true for all n). From Example 10.11 (vol. I, p. 244) we have 
for the distribution of x/p = B, 


dF =n'^P 14 


exp 


\0/ r{np) 

from which the first two moments about the origin are 

yi = 6 , //2 = — 0 ^, 

np 


giving 


var () — /fa 


We note that the likelihood function may be put in the form 


log L — {p — 1) Z log X — n log F (p) 
from which it is evident that B is sufficient. 


— np log 0, 


Example 17.9 

Consider the estimation of the parameter 2, in the Poisson distribution whose general 
A® 

term is e~^ 

x\ 

In this case the likelihood function is discontinuous and we have 


L = 


g-iiA 


Xi'. . . . Xn I 


Hence 


9 , T- , nx 

-logi = -» + _, 


giving 2 = X, the sample mean. 
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Eor the variance we have 


var A 


n 

n 

X 


£(5'-S) 

a;=0 ^ ' 


var A = -, a familiar result. 


n 


It is easy to see in this case also that A is sufficient. 

Example 17.10 

What is the most general form of distribution, differentiable in d, for which the sample- 
mean is the maximum likelihood estimator ? 

We are given that a solution of 


IS 


0 E{x) 
n 




or E{x -Q) == 0. 

This is true for all x and 6, and hence 






where K is independent of x but may be dependent on 0, say equal to Then, 

integrating, 

d^y) 


log/ — ^ dO {x — d) 


dO'^ 


dip 


{X - 0)7^ + ip + l; {x), 

where C {x) is an arbitrary function of x. Hence 

/ = /c exp |(a: - 0) ^ {0) + C (a:) |, 

which is the most general form of /. 

If ip (0) = C {X) = - ix^^ 

the form becomes the normal distribution 

f = Ic exp {— ^ {x ~ 0)^]. 

Successive Approximations to Efficient Estimators 

17.29. In the examples we have just given, the solution of the maximum likelihood 
equation was carried out without difficulty. It frequently happens, however, that the 
equation is by no means so easy to solve explicitly, though it can sometimes be solved 



SUCCESSIVE APPROXIMATIONS TO EFFICIENT ESTIMATORS 


23 


for particular values of x by iterative methods. Another possibility is to compute an 
inefficient estimator and correct it by an extra term, which can be obtained as follows ; — 
Let t' be an inefficient estimator and f a most-efficient estimator. Let 

d = f - t. 

Then var d = var t' + var t — 2 cov {f, t). . . . (17.75) 

Remembering that if E is the efficiency of t', 

var t = E var t' 
cov {f, t) 


we have 


(var t var t')^ 


VE (see (17.39) ) ; 


var o = — = — var t. 
E 


. (17.76) 


If then t' is “ nearly ” efficient, that is, if 1 — ^ is small, the average value of d ==t' — t 
will be small. 

If the maximum likelihood equation is 

/dL\ 


consider 


ddje^t 


0 , 


f = f -f var t 


d log L ' 


. (17.77) 


0 = 1 ’ 


We have 

/31ogL\ /91ogL 


dO 


x-c 


dO 


= it' - 1) 

For large n, approximately 
and hence, approximately, 

Hence 


^ + {t' — t) -f terms of higher order 

/d=t \ /o=t 

(17.78) 


log L 

302 


0 -. 1 . 


1 _ / log L 

var i \ 302 


3 log L 

W~ 


t" t' + var t 


to- -I 

= LiJ. 

varf' 
3 log L \ 


30 


’o=r 


= t' ~t' 

= t, 

and t" is an efficient estimator to a better order of approximation. This process may be 
repeated and, rather like Newton’s successive approximation to the roots of an equation, 
may be expected to improve the efficiency of an estimator. 

Example 17.11 

Suppose we have to estimate 0, the parameter in the Cauchy population 

1 dx 


dF = 


7C I {x ~ 6 ) 2 ’ 


00 < a; < 00 . 
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We have already seen that the sample-mean is not a satisfactory estimate and tliat for 
large samples the median is consistent and has variance 
The equation of maximum likelihood gives 

_ alOgL ^ y f 2{x -d) 1 Q 
dd \ 1 -|- (a: — 0)^ j 

This is a (29^ — l)-ic in d and correspondingly difficult to solve. We may, however, 
find the variance of the solution 6 from (17.63). We have 


dHogf _ 1 r 2 (X- 6)^-2 
dd^ '' 7t]_^ {1 +{x - 0)2}3 

^ 4 r i^ ~l) dx 

ttJo (1 + 


dx 


Hence 


1 


2 


var B = 


n 


The median, therefore, has an efficiency of 


8 


== 0-8, and we expect that 


t" = t' + var 0 


01ogL \ 

50 /o=r 


4^r x-t' 
n \ I + {x ~ t/yj’ 


where t' denotes the median, will be an improved estimator. 


Most General Form of Distributions possessing Sufficient Estimators 
17.30. If t is sufficient for 0 we have 

= K {t, 0), (17. 7<)) 

where K is some function of t and 0. Regarding this as an equation in t wc see tluit it 
remains true for any particular value of 6, say zero. It is then evident that t must he 
expressible in the form 

( 17 . 80 ) 

where M and k are arbitrary functions. If w = Ek (x) then K is a function of 0 arid 
w only, say N (^, w). We have then 

5^1ogL _ dN dw 

dO dx^' “ ^ ^ 

^w the left-hand side is a function of 0 and Xj only and w; is a function of only. Htmcts 

^ is a function of 0 and only. But it must be symmetrical in the x’b and hence is a 

function of 6 only. Hence, integrating with respect to w, we have 

N (t, w) = wp (0) + q (0), 
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where p and q are arbitrary functions of 6. Thus 

^ (log = {log/ (Xj, 6) } = j) (S) 2 i (x,) + qm . . (17.82) 

whence ^ log/ (*, 9) = p (6) k {x) +-q (9), 

OU 7b 

giving / (a;, 0) = exp {p (6) Jc (x) q {d) + r (a:) }, . . . (17.83) 

where we stiU write p and q for the integrated functions. 

The expression may also be written 

f{x, 6) =Q (0) E (x) exp {p (0) k (aj)} . . . (17.84) 

or, if we simplify the specification of the distribution by writing 0 instead o£ p {$), 

f{x) = Q (0) E (x) exp {0 A; (a:)} (17.85) 

It will be found that if (17.85) holds, the likelihood function is of the required form for 
the existence of a sujBficient estimator, so that the equation is sufficient as well as necessary. 


Distribution of Sufficient Estimators 

17.31. It is remarkable that the distribution of a sufficient estimator can be obtained 
directly from the likelihood function. From (17.85) we have 

log L = n log Q E log E {x) 6 E k (a;) 

giving, for the maximum likelihood estimator, 

71 dQ 


Q dd 


E k (x) — 0. 


Now, for the characteristic function ^ (tx) of w {= Ek {x)) we have — 

^00 pOO 

(a) = 1 ■ • • f {Xi, 0) dx-L . . . f {x^, 6) dx„ 

J 00 J — 00 


-{j: 




[ <2 (0)12 (a:) 

J — CO J 


_ f e (0) 

{0(9+i(x)J 

Hence the frequency function of w, if existent, is 


/ (w^) 


1 


r 


2:7r J _c 


Now from (17.86), 


-ixw ^ 


Q (0 + ^a) J 


w 


'n dQ\ 


= n S (2), say, 

and hence the frequency function of the estimator t is 


fit) 




n / dS 
2n 


0 — 


Qjt) \ 

Q (^ + ia) J 


doc. 


(17.86) 


(17.87) 


(17.88) 
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Example 17.12 

The normal distribution with unit variance may be put in the form 

/ ^ 


V'(27r) 

Comparing this with (17.85), we see that if 

Q m = e- 
R{x) 


Q-W 




V{2^) 
h {x) = X 

the condition for a sufficient estimator is satisfied. That this is (as we already know) 
the mean x may be confirmed from (17.88). We have 

«(«)= -4;log0 = 9; 


de 


and hence for the frequency function of the estimator x, 


% - f p-ix* 

I g— lanx J ^ “ 

2jz; J _QQ 

r*QO 


doL 


n 

= — exp {— Ina} — ia.n{x ~ 0)] doL 
J — 00 


■J. 


fvi 

— exp { — hnix~QY\ 

ZTC 


Example 17.13 

The Type III distribution considered in Example 17.8 may be put in the slightly 
different form 


dF = e-y=^ dx, 

r ip) 


0 <x 


00. 


Regarding p as known and considering y as the parameter under estimate, we see that 
a sufficient estimator exists, because we may write 

yP 

Q (y) = Jl_ 

R {x) = x^~^ 

k (a;) = X, 

which throws the distribution into the form (17.85). We have found the estimator and 
its distribution in Example 17.8. 

On the other hand, suppose that y is known and we wish to estimate p. Writing 

Q (P) 


r(p) 

R {x) = 
k (x) — log X 

we see that a sufficient estimator for p also exists. It is the solution of 

- ^ log r{p) + log 7 + i 2: log X = 0, 
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■which does not permit of expression of p as a simple function of the x’s. The sampling 
distribution is not expressible in a simple form. 


Example 17.14 

Consider again the Cauchy distribution 


dF 


dx 


ct 1 {x — 0)^’ 


oo < a; < 00. 


Evidently this cannot be thrown into the form (17.85) and hence no sufficient estimator 
exists. We have already found (Example 17.11) that there is an efficient estimator. For 
finite n no single estimator can contain all that the sample can tell us about 6. 


Sufficient Estimators when the Range depends on the Parameter 

17.32. One of the conditions of the theorem of 17.23 and that of 17.27 is that the 
range should be independent of 0. In the contrary case our results, particularly for sufficient 
estimators, require reconsideration. 

Suppose the range of the frequency function is from 0 to 6, where b is fixed. If there 
is a sufficient estimator for 0, say t, the distribution of t and any other estimator is inde- 
pendent of 0. Take x^, the lowest value of the sample, as such other estimator. Then 
if t is fixed the distribution of x^ is independent of 0, which is clearly impossible unless in 
fixing t we also fix x^, that is to say, t is a function of Thus if a sufficient estimator 
exists it must be a function of Xi- 

Similarly if the range is from a to 0, a sufficient estimator for 0 must be a function 
of the largest sample member. 

17.33. If x-i_ or some function of it is sufficient for 0, the lower extremity of the range, 
and a’l i.s fixed, the probability that any particular sample value x is greater than x^ is 
proportional to / {x, 0). This must be independent of 0, since x^ is sufficient, and hence 
so is f {x, 0)/f{Xi, 0). Thus 

.(17.89) 

and this is the most general form admitting a sufficient estimator. 

It remains true in such circumstances that the smallest member of the sample is 
a maximum likelihood estimator. For the likelihood is 

T ^ 9 (xi) ... <7 (a:») 

{^( 0 )}- ’ 

which is clearly a maximum when h (0) is a minimum. Now since the total frequency is 
unity we have, from (17.89), 

h {$) = { g (x) dx (17.90) 

J 0 

0 cannot be greater than x^, for then such a sample value could not appear. The value 
which minimises h (0) is seen from (17.90) to be that which minimises the range, i.e. Xx. 

17.34. When both extremes of the range, a and 6, depend on 0, some further modi- 
fication is necessary. Suppose that a is equal to 0 and that 6 (0) is some strictly decreasing 
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function of 6. Let be the Talue such that 6 (X„) = x^, the gi’eatest member of the 
sample, and let t he the smaller of Xx and Then of the inequalities 

t <Xx, 6 (0 > • • • • • (17.91) 

one at least is true. But the first equality implies that i > 0 and the second that 
b {t) <,b {6), and either of these two implies the other. Hence both inequalities in (17.91) 
are true, and 

B <,t Kxx <Xj^ <,b (t) <6(0). .... (17.92) 

Samples with fixed t then lie in a fixed range, and hence t is sufficient if the frequency 
function is of the form (17.89). It would seem that this remains the most general form of 
frequency function admitting a sufficient estimator when both extremes of the range 
depend on 0. 


Example 17.15 

Consider the rectangular distribution 

dF = % -e <x <6. 

26 

If we take the ordinary likelihood equation we get 

log L = — ~ n log (20) = — 
dO ^ 30 ’ 6 

For this to vanish 0 must tend to infinity, an obviously nugatory result. In accordance 
with the above discussion we should take as our estimate of 0 the smaller of Xx and — x,^, 
and this is obviously sufficient, for nothing in the sample can tell us more about the 
terminals of the range than its most extreme members. 


Intrinsic Accuracy 

17.35. If the samphng distribution of an estimator t is 


we define the accuracy of t as 



(17.93) 


(17.94) 


It is evidently essentially a positive quantity. We assume, unless the contrary is stated, 
that the range is independent of 0. 

r is the quantity we have already encountered in (17.67) as the reciprocal of the 
variance of t when it tends to normality in large samples. As in 17.27, we have 


r <n 



. (17.9.7) 


<.n I, say, where 


Now I is independent of the estimator t and we may call it the intrinsic accuracy of 
the distribution / in regard to 0. It is intrinsic because it depends only on /. It may 
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be termed accuracy because it provides, for large samples at least, a minimum to the 
variance of possible estimators of 0. We know from 17.25 that under certain conditions 
the maximum likelihood estimator attains this minimum for large samples. 

17.36. We may now extend the definition of efficiency of an estimator to the case 
of small , samples. In fact, the efficiency is the ratio of the accuracy of an estimator to the 
intrinsic accuracy of the distribution for the parameter under estimate. This is easily 
seen to apply to the case of large samples for which efficiency was defined in 17.12, and. 
may be apphed to finite samples or non-normal sampling variation. Tor such cases, 
however, it is conceivable that the efficiency might exceed unity. A proof that this is not 
so when the range is independent of d is suggested in Exercise 17.12. 


17.37. If the range is independent of d we have 
and hence the following three expressions for the intrinsic accuracy are equivalent . 


*(^01 


var 


/ 3^ log/ \ I. (17.97) 

\ 30 “ / 

aiog/\ 
dd ) 

This equivalence holds if / is zero at the extremes of the range. For we then have 

To I J ^ I Tdo a0 


0 


r 


^’fdx = - /(a. 0 )^ +/('-. 0 ) 

J d J (I 

^ a/ 
w 


dx. 


But if f is not zero at the extremes the equivalence may break down. (Cf. Exercises 17.9 
and 17.11.) 


Amount of Information 

17.38. The quantity nl has been called the amount of information about 6 in the 
sample of n, and I may be called the amount of information per member of the sample. 
The use of “ information ” in this specialised sense has not been universally accepted, 
but some of the properties of I are such as we should require of any measure of information. 

(а) If the parent does not contain 0,1 — 0 so that no sample can tell us anything 
about 0, which must obviously be so. 

(б) Since sufficient estimators contain all the relevant information in the sample 
we expect their accuracy to be nl, and conversely. That this is so may be seen as 
in 17.27 and 17.28. In fact, if t is such that the equality in (17.72) holds, var = 0 

and for fixed t, — is constant, irrespective of the form of distribution of t. Log L 

is then of the type required for sufficiency. 
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(c) The sum of the amounts of information in two independent sample-members 
is the amount of information in the pair taken together. For if their joint distribution is 

dF = /i {x, d) dxfz {y, 6) dy, 
we have for the intrinsic accuracy 

92 log /i 


which is the property stated. 


(17.08) 


Loss of Accuracy 

17.39. Where no sufficient estimator exists, it follows from (6) of the previous para.- 
graph that no estimator for finite n can contain all the information in the sample. In 
so far as any particular estimator falls short of the ideal we may be said to lose information 
by using it. No estimator can avoid losing something, although of course some may 
lose less than others. 

Presumably the loss will be greater for large samples than for small ones, and will 
be least for maximum likelihood estimators. We may calculate the loss in this case. If 
t is the maximum likelihood estimator of B, we have, to a first approximation, 


dB ^ ^ 902 ■ 


(17.!)!)) 


The variance of - in samples for which t is constant is thus the variance of — 

dB ^ dO^ 

within the set multiplied by {t — By. Now the total loss of information, from 17.27, 

9 log L\ 

j, and hence is equal to the variance of t multiplied 


is seen to be var u = var 


92 log L 

by the total variance of — within sets for which t is constant. This we now evaluate. 

oB^ 

Suppose the distribution is grouped so that the “ expected ” frequency in the )th 
group is rrij. The hkelihood is then proportional to mf'* . . . and apart fi-om 
constants independent of B we have 

log L — S Tij log nij .... . 


9 log L _ y,m 
dB 

92 log L 


Z — n, where rn' 
m 


dm 


We have at once 


var t 


902 


= -EZ 


= Z 


/ m" 
\ m 


m 

m 


m 


2 ' 


m‘ 


n] 


( 17 . 100 ) 

(17.101) 

(17.102) 


m 


' 2 ' 




= -EZ \m" 


m 


m 
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We shall find it most convenient to regard the ^’s as distributed over the groups first of 
all without restriction and then subject to two linear constraints expressed by S {Uj) = n 

dlogL _ y, /m p'rom this viewpoint the %’s may be regarded as 


and 


dd 


(-»') = 
\m / 


distributed in the Poisson form with mean and variance m (not the binomial because we 
are not introducing the restriction that the samples should be of fixed size, except as a 
constraint). 

Now if U m Tij) is a linear function of the %’s subject to a linear constraint S (a^. Uj) = p, 
its variance is 

{kocm) 


S {k^ m) 




(17.104) 


and a second constraint reduces the variance by a term similar to the second in this expres- 
sion. The result may be seen from geometrical considerations. We may write 

E{kn) ^ E(k^m.^^ and 

Z (a%) = E{^^/m.^^, 

7h 

where the variables — r- have unit variance and mean -\/m. Consider the different values 

\/m 

of the %’s, say s in number, as the co-ordinates in a Euclidean space. The density function 
of the variables is then symmetrical about a point (V'mi, • • • V^s) which we 

transfer the origin. The variance of the unconstrained variables is then equal to the 
reciprocal of the distance from the origin to the hyperplane E {k-\/mx) = 1, namely, to 
E {k^ m). But when the constraint is imposed, the variance becomes proportional to the 
reciprocal of the distance from the origin to the hyperplane in the direction parallel to 
E {(x.^/mx) = 0 and is hence reduced by the amount 

cos2 cf> E {k^ m), 

where <!> is the angle between the planes. This quantity is 

E^ [k ‘\/vn> . (X. -y/fn') y, ^ 

Z(PmjZ(a^%i) ^ 


which gives us the second term in (17.104). 

Now for the first linear constraint E (%) = constant = % we have a = 1, and the 
reducing term is (since E (m) = n also) ; 

--E^{km). 

n 

fyYif 

For the second constraint we have oc = — and hence the term is 

m 


E^ {km') 



Thus the variance of E (kn) is 

E (k^ m) --E^ {km) - 
% 


Z2 (/cm') 


m 


. (17.105) 


m 



32 


ESTIMATION : LIKELIHOOD 


Now taking 

and remembering that 


k 


var t 


m 

m “ 

m 

m2 

=^E 

/m'’2\ 


V m / 


we see from (17.102) that the loss of information is, for large samples, 
[mV m ) \ \ „ ( m mV m/ 


\ m / 




m ^ 
m 


By considering the width of the groups as tending to zero we may apply this result 
also to continuous distributions. 


Eocample 17.16 

In the distribution 


dF = 


dx 


00 < a: < 00 


nl + {x - 0)2’ 

there is no sufficient estimator, as we have seen. Let us consider the loss of information 
consequent upon using the maximum likelihood estimator. 

We may write for our “ expected ” value m 

_n dx 
ac 1 + (a; — 0)2 

4p2 dp n 

(1 -H p^)^ ~ 2 

=!!:[“ 4:(p- - In 

(i+j>T 


m 


Hence 


i: 


/ m'2\ _n 
\ m / TEj. 


E ■{ — I m 
l^m 


I m 


\‘IL 


8 


Hence, from (17.106), the loss of information is 

7 _ 1 
4 2 


0 


5 

4' 


The intrinsic accuracy of the original distribution is so the loss of information is equivalent 
to 2| observations for large samples. Eor small samples it will presumably be smaller, 
since it vanishes for samples of one. The loss by use of the maximum likelihood estimator 
is therefore very slight and becomes of diminishing importance as the size of the sample 
increases. 


Ancillary Estimators 

17.40. Where no sufficient estimator exists no single estimator can avoid the loss 
of information ; but we may take an additional function of the variables which, together 
with the maximum likelihood estimator, wiU give an accuracy tending to unity in large 
samples. By taking a third function we can improve the accuracy still further, and so 
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on. The process is analogous to approximating to the value of a function (the likelihood 
function) by ascertaining its differential coefficients at some particular point of the range. 

3 lo Ij 

In fact, suppose that, in addition to the estimator which gives — — for some value 

of 0 such as t, we also fold ^ for that value. The variance of over values 

in the neighbourhood of those for which these two are constant is then, to the first 
approximation, the variance of 

93 log L 


i(t-ey 


90' 


which has ordinarily a mean value and variance of lower order in n. In particular, if t 


is the maximum likelihood estimator, so that ^ 


9 log L 

'W 


) 


the value of 


( 


93 log L 


' 0 =< 


may provide supplementary information which enables us to approximate more closely 
to the hkelihood function and hence salvage some of the lost information. Such a quantity 
is accordingly called an ancillary estimator. Cf. 17.29 above. 


Multivariate Distributions with One Parameter 

17.41. We now proceed to consider the extension of some of the foregoing results 
in two directions : {a) where there is more than one variate but still only one parameter, 
and (6) where there is more than one parameter to be estimated. 

The former raises no new point of difficulty. To take the bivariate case as an example, 
if the frequency function is f{x,y, 0), the likelihood is 

L=fix„y„0) . . . f{x,„y,,,0) .... (17.107) 
and our maximum likelihood estimator is obtained by maximising L in the usual way. 


Example 17.17 

To estimate the parameter p in samples of n from 

1 r 


We find 


lOLf 


(IF ^ , exi 

27r(l-^p2)Jr 


( 


, ^ (^3 - 2pxy + y^) I dx dy. 


L constant — log (1 — p"^) — ... {^(^■^) — 2/) 27 {xy) + 27 (y^) }, 

2 2(1 


whence, for = () we have 

dp 


rip 


(1 


_ {E{x^) - 2pE{xy) |-2-’(v/3) } + ^ 2 (a://) == 0 ; 


reducing to the cubic in p, 


1 -|~ p*^ f 

n + — 


(xy) 


(2 (*.3) + 27 (2/3) } = 0. 


/>(l-p3) l-p^ 

It is interesting to note that this does not yield the product-moment of the sample. 


A.s. — 11 


D 
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We have, after a little reduction, 

log/ _ 1 + p® , 2 O r 2 \ 1 + Sp®, 4p 

9 p 2 (1 - p 2)2 ^ ^ (1 _ p 2)3 + (1 -^ 2)2 

Since E {x^) = j® («/2) = 1 and E (xy) = p, we have, for the estimator p, 


xy. 


w varp 


1 + p' 2 (1 + 3p2) 


whence 


(1 - p 2)2 

var p 


(1 -p2)^ 

(1 - p 2)2 


+ 


4p^ 


(1 - P^) 


2 \ 2 ’ 


n (1 +p2)’ 

This is less (and may be considerably less) than the variance of the sample product-moment 
in large samples, (1 — p^)^/n. The efficiency of the latter is 1/(1 + p^). 


Simultaneous Estimation of Several Parameters 

17.42. We now turn to the case when the unknown parameters are more than one 
in number. To simplify the exposition we shall consider the case of two parameters dx 
and but examples not infrequently arise where more than two have to be estimated — 
for instance, in the fitting of certain Pearson curves there are four. To fix the ideas, 
consider the normal distribution 


dF 


{- j 


CO < a: < 00. 


02 

The likelihood function, except for constants, is given by 

\ogL= -nlogd^-~E{x ~Qx)\ . . . (17.108) 

It is natural to generalise our principle of estimation by looking for estimators which shall 
maximise L for independent simultaneous variations of 0i and 02, i.e. to require that 

d log L . d log L 

^ ® 

In our case this leads to 

E {x — Bx) = 0 


I 1 V / 


e,Y = 0, 


whence for the estimators §x ^'iid 


§x = ~E (x) = X (17.110) 

Si = ^E (x - x)^ (17.111) 


Thus the sample mean and variance are estimates of the population mean and variance. 
We note incidentally that the estimator is biassed. 


17.43. There is one possible source of confusion here which should be removed. 
If we know 6i, then §2 is given by 

S 2 = ^E (x - Bx)\ (17.112) 

which is not the same as (17.111), the sample-mean x having been replaced by the known 
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quantity Suppose then we estimate d^hj x, as we may do whether we know 02 or not, 
since (17.110) does not contain 02* We may then ask, what is the estimator of 02 which 
maximises the likehhood for all samples giving the ascertained value of 0i, namely, x ? 

This is an entirely different question from the one which gave rise to (17.111) and we 
must not be surprised if it has a diiBferent answer. The variations of L from sample to 
sample are now considered in a certain sub-population for which x has a fixed value. 

In our particular case the problem can be solved explicitly. The likelihood function 
can be thrown into the form, with variables x and s — 


L dx ds 


0 


X 


u 


n 

271 


exp I 


n 

201 ^" 




( 

- i}\ej 


exp 


ns^\ 

~wj 


dx ds. 


(17.113) 


r {f(w, - i}V02/ 0s 

where s^ is the sample variance. 

If we maximise the likelihood in this form for simultaneous variations of 0i and 02 
we arrive back at (17.110) and (17.111), as of course we must. But if x has a fixed value, 
the distribution of s becomes of one lower degree of freedom. The likelihood is then 
proportional to the second factor in (17.113), viz. 


Qn-l 


exp 


ns^ 

551 


y 


and for variations of 02 this is maximised by 

n „ 1 


0? 


n 


1 


n 


Zix 


x)'^. 


(17.114) 


This, it may be noticed, is an unbiassed estimator. 


17.44. The difference between (17.111) and (17.114) is apt to be confusing, for both 
are, in a sense, maximum likelihood estimators. The distinction arises from the fact that 
we are considering the variation of L in two different populations, the first over all samples 
of size n, the second over the more restricted samples subject to the further constraint 
E {x) = constant. The difference when n is large, of course, is quite unimportant, but 
as a theoretical matter the point has some interest. 

Which of the two is employed for practical estimation is a matter of choice. At first 
sight it may strike the reader as objectionable to use (17.114), because x is not known before 
the sample is drawn, and there are obvious dangers in basing an inference on properties 
of the sample which are determined a posteriori. This objection, however, does not lie 
in the present case. We make up our mind beforehand that, whatever x may turn out 
to be, we will make an inference in relation to the sub -population of samples determined 
by it. There is, in fact, no posterior determination of the rule of inference. 

17.45. Possibly without realising it, the reader is already accustomed to make an 
inference of this kind in relation to a sample number. We do not usually determine before- 
hand what size the sample must be ; our results (apart from the distinction between small 
and large samples, which is another matter) are true for any n, whatever n may turn out 
to be in practice. In the same way the estimator (17.114) is a maximum likelihood esti- 
mator, whatever x may turn out to be, x being a property of the sample, just as n is. 

The fact remains, of course, that (17.111) and (17.114) give different results. Which 
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is the better \ The answer depends on what we require of the estimator. If we wish 
to choose 01 and 02 so as to maximise their joint likelihood we choose (17.111). If we wish 
to select them so that the likelihood is maximised for 0i and then, for the observed x, is 
maximised for 02, we choose (17.114). 

17.46. It may be shown that, as for the case of one parameter, the likelihood esti- 
mators of several parameters are consistent under very general conditions and tend for 
large w to be distributed in the multivariate normal form. We omit the proof of these results, 
which the reader will probably be willing to accept, and proceed to a generalisation of 
the theorem of 17.26. Thus : — 


(а) If the frequency function / {x, 0i, 02, • ■ • 0p) is continuous in x, and 

(б) if in a certain interval containing the true values 0io, 02 o, • • 


5/ . 




continuous in dj for every x, approaches a continuous function of dj for large 


9/ 


n, and does not vanish in some interval, then 
ddj 

n cov 0fc) = 

where A is the (Hessian) determinant 

^=|r fd. 

J -GO \ J Ojo \ /Oao 


. (17.115) 

. (17.11(J) 
1 this reduces to the 


and Ajji is the minor of the jth row and kth column. When p 
case of a single parameter. 

As n tends to infinity the joint distribution of the maximum likelihood estimators 
tends to the form 

/ = k exp |— ^ A" 0^- — 0yo)(^ft - ^a-())|- • • .(17.117) 

The theorem will be established if we show that 


r / d iog/ \ / ^g/\ ^ 

J -» \ SOf )>„ [ j.„ ’ 


. (17. IIS) 


for then the values of the variances and covariances of the ^’s are as stated in (17.116). 
(Compare 15.12.) 

Make the transformation 

q, ^ U {6, ~ 0,,) (17.119) 

j 

and choose the A’s so that the exponential of (17.117) becomes 


Then 


23 

-2f^- 

9jk ~ A Aftfc. 
h 


. (17.120) 


The g’s are independent normal variates with variance Ijn. Hence, from the theorem for 
the case of a single parameter, already proved, we have 




1 . 


(17.121) 
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Eurther, we have 


L( 


9 log/ 9 log/ 

9gj 


for if we put 


and 


% 




^ f dx = 0, h ^ I, 

(Uji Ui) 

{un + '^i) 


(17.122) 


1 

V2 

1 


V2 


the expression becomes one half of 

which vanishes since the 'ids have the same variance as the s. 
Now 

90,- Ajo ^7/1 \ 

Hence 


( 


J-«\ A. V /«» } 


in virtue of (17.121) and (17.122), 
from (17.120). The theorem follows. 


^ -^hj ■^hk’ 
h 


Hik 


Example 17.18 

Let us estimate the five parameters of the bivariate normal form 


dF 


27E (Ti (Ta (1 “ p'^)^ 


exp 


1 I - aV __ 2p (x - a){y - 

2(l-p2)|\^ or^ ) OTiUa 

A'iT 


dx dy. 


00 < X, y <. CO. 


It will be found that the partial differential coefficients of log L yield, on solution, the 
estimators 

a. = X, ^ — y 

_ iz (a? — x) 2 


^ 9 

(Tr 


n 


paids = - E {x ~ x){y ~ y) 
n 

1 


al = t-E{y -yy 


n 


so that for simultaneous estimation the sample means, variances and covariances are 

estimates of the corresponding parameters. 

To evaluate the sampling variances and covariances we have to evaluate integrals 

of the type 

r r / 9iog/ 3 log f \^p 

These are easily obtainable, being merely functions of moments of different orders. 
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Example 17.19 

Consider the Type III distribution 

,p-i 

exp 


dF 


1 / X — aV 

(p) \ a ) 


(^) 


dx. 


a < a; < oo. 


For the likelihood we have 

( QQ OC \ f 00 OC \ 

— G ) ~ ^ \ a /' 

The three partial differential coefficients give 

1 


(p - 1 ) r 


+ - = 0 


n 

G 


(x — a) G 

+ J_ i: (cc — a) = 0 


-n^iogr(p) = 0 . 

For the Hessian, taking the parameters in the order a, g, p, we have 

G^ 

P 

G‘^ 

1 
G 


G^ (p - 2) 
1 

G^ 

1 

cr (p — 1) 


a{p-l) 

1 

a 

dnogPip) 


d^ log Tip) 


+ 


dp^ 

1 


(p -- 2) dp^ p ~~ I- {p ~ i-y 

From this the sampling variances are found to be 

1 r log r ip) _ j 


A. 


var a 


var G 


nAG'^ I ^ dp'^ 


1 dnogPip) 


'/iZlcr‘^\p — 2 dp‘^ 


iP - ir 


var p == — — — — rr — 

^ nA (p — 2) G^ 

Sufficient Estimators for Several Parameters 

17 .477 . As a natural generalisation from the case of one parameter we shall say that 
. . . tp are jointly sufficient for . . .6^ if, and only if, the likelihood function can 

be expressed as 

Hx, . . . e^ . . . ep . (p, a. . • ■ opL,(x, . . . ( 17 . 123 ) 

It evidently does not follow that if 62 . • ■ Pp known fi is sufficient for 0i. This will 
be true only if the function L]_ may itself be factorised, e.g. 

Li {h . . . tp, 01 . ■ . 0^) = ^11 {h, 01 . . • 03,) («2 . . . tp, 0. . . . 0p). . (17.124) 

If a case occurred in which 

Lx = Lxx {txi 0 i) -)^i 2 ('^a, 02 ) - • ■ ^1%) O'pi ^3,) 


. (17.126) 
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we might say that each t was sufficient for the corresponding 6 or that the set of f s was 
completely sufficient for the 0’s. Such cases, however, are very rare. 

Example 17.20 

From (1 7. 1 1 3) it is evident that x and s are jointly sufificient for m and o. If a is known 

X is sufficient for m, but if m is known s is not sufficient for a. The two are not completely 
sufficient. 


17.48. The properties of sufficient estimators may be proved true, with certain 
modifications, for several parameters, but we shall not take the subject further except 
to quote one result. 

• • • Qp) is continuous and not zero over some continuous range of the 0’s, 
^ exists, then it is necessary and sufficient for the existence of a set of jointly sufficient 
estimators that 

/ = exp I ^ H + 7 1, . . , (17.126) 

where and B are arbitrary functions of the 0’s and Xj, and 7 of x. (See Koopman, 1936.) 
Example 17.21 

The Type III distribution of Example 17.19 gives us 

leg/ = - p log <7 - log r ip) + ip - 1) log (rr - a) - --7 
If a is regarded as known, this may be put in the form 

n 

— h (/> — 1) log {x — a.) ~ plogo — log F ip), 

which is of type (17.126) with 

Ai = , Xi — X — a 

a 

Az ~ p — l, Xz = log [x — a) 

B = — p log o- — log r ip). 

Ttus if a is known, there are sufficient estimators for <j and p jointly. It will be clear on 
inspection that if ot is unknown there are no sufficient estimators, even if a and p are known. 


Parameters of Location and Scale 

17.49. Consider a frequency function expressed in the form 

X — a\ / X — a\ 


dF=g 








( 17 . 127 ) 


The parameter a may be regarded as locating the distribution and / as determining its 
scale. In particular the normal distribution may be put in this form. We may write 

dF = exp (j> (I) dt = exp (^) 

P 

i = and cf> ii) = log g (|). 


where 


. ( 17 . 128 ) 
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In samples of n we have 

logL = — n log /9, 

giving for the maximum likelihood estimators 

d log L 1 


den. 

d log L 


1 


whence we may solve for a and p. 

For the variances and covariance we find 


■ 2 :< f >'=0 
{F 4' t -i-n) = 0, 


(17.129) 

(17.130) 


aa2 / \p 


^log/ 

doc 


*( 

E = n {^,{r + 24' i + 1) 




/9Mog/N 
V doid^ J 

= JE 


1 

1 

¥ 


{4' + r I) 

.'4 = - 




/a log/ 9iog/\ 

\ doc d^ }'■ 


^{9) 


(17.131) 


E 


and the Hessian of (17.116) becomes 

-m 

-‘{9) 

from which the variances and covariance of a and ^ may be determined in the usual way. 

In (17.131) it would be a great convenience if the quantity — E vanished, for 

then a and ^ would be independent. By a suitable choice of origin we can, in fact, ensure 
that this is so. Put 

_ E if 

^ ^ E {4") ' 

E {<!>- ^) = j f) + V I 

= E (cr + 

so that 

E (4" 0 = 0. 

With this origin we have for the variances of the (uncorrelated) variables 5. and 


(17.132) 


Then 


vara = 


var ^ 


nE (4") 


.. (17.133) 
. (17.134) 


~~ n[E (4" n - I)' 

The point of location so defined, namely, as that for which a and ^ are uncorrelated, has 
been called by Fisher the ceintre of location. 
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Example 17.22 

For the normal distribution 

dF 


X — (X 


dx 


■we have ^ == — 

E = ~ 1 and E {cf>" $) = 0. 

Hence C ~ i, and the origin chosen is itself the centre of location. 
(17.134) we find the familiar results (for large samples) 


From (17.133) and 


var a = var a; = — 


var = var s 


n 

ii 

2n 


with X and s uncorrelated. 


Example 17.23 

Consider again the Type III distribution 


dF 


1 (x 
'{P) \ 


a 


.p-i 


exp 


X — 


al j / X — (x\ 

-r(— )■ “ 


a: < 00, /> > 2 


r(p) \ ^ y " I 

where we assume p known. The condition p > 1 is required to ensure the vanishing of 
the frequency function at the extremity x — a, and /? > 2 to ensure the convergence of 
some of the mean values. 

Here 

(f> = constant — • I + (p — 1) log i. 


Hence 


E {cf>") = E 
E (^ <j>") = E 




1 




P 

= — 1 


E{i^ <f>")=E{~p + l) = - ip-1). 

Thus 

C = ^ - (p - 2). 

The centre of location is distant (p — 2) to the right of the start of the distribution. In 
terms of C we have 

(f> == constant — C — (p — 2) + (p — 1) log (C +.p — 2) 


Hence 


cA' = _ 1 + ^ - y- 

C + P — 2 

E ir) = - l/(p - 2) 

E {cf>" ~ 1) = - 2. 




ip — 1) 

(C +p-2)2 


var a = 


var ^ 


Pip - 2 ) 


n 


2n 
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Efficiency of the Method, of Moments 

17.50. In previous chapters we have fitted distributions of the Pearson type to 
other distributions by identifying lower moments. We were there mainly concerned with 
the properties of populations only and no question of the reliability of estimates arose. 
If, however, we regard the data as a sample from a population, the question arises whether 
fitting by moments provides the most efficient estimators of the unknown parameters. 
As we shall see presently, in general it does not. 

Consider a parent form dependent on four parameters. If the maximum likelihood 
estimators of these parameters are to be obtained in terms of linear functions of the moments 
(as in the fitting of Pearson curves), we must have 

=a, + a,S {x) +a,£(x^) + a,S (*“) + a, S (x‘) . (17.135) 

dd 

and consequently 

f {x, 01, 02, 03, 04) = exp {6o bi X bi bs 64 x^}, . (17.136) 

where the 6’s depend on the 0’s. This is the most general form for which the method of 
moments gives maximum likelihood estimators. The 6’s are, of course, conditioned by 
the fact that the total frequency shall be unity and the distribution function converge. 

Without loss of generality we may take 61 = 0. If, then, the other 6’s vanish except 
60 and 62 the distribution is normal and the method of moments is most-efficient. In 
other cases, (17.136) does not yield a Pearson distribution except as an approximation. 
For example, 

^ = 262 X + 363 -f- 464 a?®. 

dx 


If 63 and 61 are small this is approximately 


dlogf 

dx 


262 X 


.^3 

260 


X 



. (17.137) 


which is one form of the equation defining Pearson distributions (cf. 6.2). Only when 
63 and 64 are small compared with 62 can we expect the method of moments to give estimates 
of high efficiency. 


17.51. A detailed discussion of the efficiency of moments in determining the para- 
meters of a Pearson distribution has been given by Fisher (1921a). We will here quote 
only one of the results by way of illustration. 

We found in Example 17.19 that the variance for large samples of the maximum 
likelihood estimator jo is given by 


var jo 


or, if p = p — 1, by 


^ r or^ iogrip) _ , 1 


var p 


d^\ogr{l +p) 2 1 


dp‘^ 


+ 
p p 


} 


n 


(17.138) 
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Now for large 5?,* 

r> /I I \ 

—iogr(i+p) = ^. 

We then find 


I ^ log 2)1 + (y + i) log 2> + I2j, 36 Pj)=' 1260?)'' 


and hence, approximately, 


var p = ~(p2 + i^)). 


n 


. (17.139) 


If we estimate the parameters by equating sample-moments to the appropriate moments 
in terms of parameters, we find 

a. <yp = nil 


so that, whatever a and cr may be, 


of2p = 

2pa^ = Ms 


bi 


ml 


(17.140) 


where bi is the sample value of /5i. Now for estimation by the method of moments (cf. 

9 . 22 ), 

var bi = — — 24^2 H" 12/^3 -t- 35^i)j 


n 


which for the present distribution is readily seen to reduce to 


var bi 


^1 6 (p -f- 1) (/> + 5) 


. (17.141) 


n p 

Hence, from (17.140) we have for p, estimated by the method of moments, 

P" j 

var p = ^ var bi 
6 


= -p(p + l)(p +5). 


n 


Tor large p the efficiency of this estimator is then, from (17.139) with p = 1 + p, 

ip + 1 ) (p + (P + 

which is evidently short of unity in many cases. When p exceeds 38-1 (^i = 0-102) the 
efficiency is over 80 per cent. For ^9 = 19 (/5i = 0-20) it is 65 per cent. For p = 4 a more 

d ^ log r (1 -f” y) ) 

exact calculation based on the tables of the trigamma function shows 

that the efficiency is only 22 per cent. 

* The series for the log P function is given in most hooks on advanced calculus, e.g. J . hldwards. 
Integral Calculus, vol. 2, article 942. 
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NOTES AND REFERENCES 

The greater part of this chapter is based on the researches of R. A. Fisher, the main 
papers being those of 1921a, 19256 and 1934a. The idea of maximising likelihood may 
be traced back to Gauss and was considered by Edgeworth, but may be regarded as begin- 
ning to exercise an influence on statistical theory only with the publication of Fisher’s 
first paper in 1912. 

The theorem giving the limiting variances and covariances of maximum likelihood 
estimates was proved (incorrectly) by Karl Pearson and Filon in 1898 before it was realised 
that it applied only to maximum likelihood. The necessary correction was given by Edge- 
worth (1908) and Fisher (1921a), but rigorous proofs were not available until the work of 
Hotelling (1930) and Doob (1934a and 6, 1935, 1936). In the text we have followed 
Hotelling’s treatment. 

The inefficiency of moments in fitting distributions, pointed out by Fisher (1921a), 
has led to some controversy, for which see Koshal (1933, 1935), Myers (1934), Elderton 
and Hansmann (1934), K. Pearson (1936), and Fisher (1937a). The reader who pursues 
this subject so far as to read any one of these papers should read them all. 

For work on sufficient estimators see Koopman (1936) and Pitman (1936, 19376), who 
independently obtained the general form of distribution admitting such estimators. The 
theorem that sufficient estimators have the property 17.17 is due to Fisher, rigorous proofs 
being provided by Neyman (1935a) and Dugu6 (1936a). Reference should also be made 
to papers by Bartlett (1936a, 6, 1937c, 19386, 1939a, 1940) on the problem of several para- 
meters and what he calls “ conditional ” statistics, i.e. those similar to when x or some 
other function of the sample values is regarded as known. See also Neyman and Pearson 
(1936a). 

Among recent jjapers, that by Pitman (1939a) on parameters of scale and location, 
and that by Welch (1939c) on the distribution of maximum likelihood estimates, are 
noteworthy. 

Geary (1942) has recently proved a remarkable generalisation of the theorem that 
in large samples maximum-likelihood estimators have minimum variance in the case of 
one parameter. In fact, for several parameters the maximum likelihood estimators 
minimise the “ generalised variance ” as defined in Chapter 2H. 


EXERCISES 

17 . 1 . If i is a most-efficient estimator and t' a less-efficient estimator with efficiency 
E, and if the correlation of t and t' is p, show by considering the estimator t" defined by 

(1 E — 2p ^/E) t" = (1 — p ‘\/E') t (^E — p -\/E) t 
that p — -x/E (for in the contrary case var t" > var t). 

(Fisher, 19256.) 

17 . 2 . If in trials of an event with probability p there are x successes, show that 
a maximum likelihood estimator of p is x/n. Find its sampling variance and show that 
it is sufficient. 

17 . 3 . Show that the distribution 

dF — ^ exp { — j ic — G \ ) dx, — oo < x < oo 
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has a likelihood furietion for a sample of n which is a maximum at the median if 71 is odd 
and between the (n/2)th and {n /2 l)th members if n is even. 

17 . 4 . For the distribution of the previous exercise show that for a sample of {27n -1- 1 ) 
members the median has an accuracy 

(m + 1) (2m + 1) f (2m) ! ] 

(m — 1) \ 22"^-! (m !)2 J ’ 

Hence, as m tends to infinity, the loss of information tends to 4 ■s/iynjn') — 4. Thus, 
although the median is most-efficient the loss of information in large samples does not 
tend to a constant. 

(Fisher, 19256.) 

17 . 5 . Show that if a most-efficient estimator A and a less-efficient estimator B tend 
to joint normality for large samples, B — A tends to zero correlation with A, 

Show that the error in B may be regarded as composed (for large samples) of two 
parts which are independent, the error in A and the error in H — A. (The first may be 
regarded as sampling error, necessarily inherent in the problem of estimation, the second 
as error due to the inefficiency of the estimator.) 

(Fisher, 10256.) 


17.6. Show that the distribution of the median in a sample of (2m + 1) observations 
from the population 


TT 1 -f- (^ — 6)2’ 


is given by 


(IF 


(2m + 1) ! /7t‘‘ 




(m !)2 \ 4 

where tan ^ = x — 0 and j </» j < \n. 

Show hence that the accuracy of the median is 


00 < a; <00 

‘ dx 

1 4- (a: --- 0)2’ 


(2m 4- 1) '• 
(mh)2 


1 j!., {2™^ ^ + (t ~ ( Y ^ 


d4, 


, 3m (2m - 1 - 1) (m. -f |-) ! ( j" 2m 

Z rv / T \ o f 


2 (m — 1) Tc'^ 2m 


y.' 


m — 1 


m— (^) ' 


+ ^ 


(m 4- |) ! / I r 2m ^ 


2m — 1 \ jr 


m — 1 


»-3 (&t) - 


where (z) is the Bessel function of order n and in particular { 71 ) = {27t.) = O, 

(n) = J. {2n) = - i, and 

- n " 7t 

J _ T _ J 

^n+l •'n ^n-1 

Z 

(Fisher, 19266.) 
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17.7. Show that the most general continuous distribution for which the maximum 
likelihood estimator of a parameter Q is the geometric mean of the sample is 

f{x, Q) = exp (0) + t {x) }, 

where y) is an arbitrary function of 6, and C of x. Show further that the corresponding 
distribution giving the harmonic mean is 


/ {x, d) = exp 





(Keynes, J.B.S.S. (1911), 74, 323.) 

17.8. Show that, if m is known, the estimator 

s = 

is sufficient for a in samples of n from 

" = { ■ 27 “ ~ 

and ffiid its distribution by the method of 17.31. 


17.9. By considering the distribution 

dF = dx, 0 < a; < 00 

show that the three forms of (17.97) are not necessarily equivalent when the range contains 
the parameter to be estimated. 

(Pitman, 1936.) 


17.10. Show that if the frequency function is continuous and is zero at an extreme 
which is a function of 0, there still exists a maximum to the intrinsic accuracy, defined 



(Pitman, 1936.) 


17.11. 


By considering the distribution 


dF 


2x 

20 + 1 ’ 


0 <a; <0 + 1 


show that the intrinsic accuracy is 4w.^/(20 + 1)^. Show further that the largest member 
of the sample is sufficient for 0 and that its distribution is 


Hence show that 


dF = a (x) dx = 


2nx {x^ — 02JW-1 


dx. 


^ / 3 log a\^ 4u^ (0 + 1)^ 4w,0^ 

V 30 / (20 + 1)2 {n ~ 2) (20 + 1)^’ 


so that the mean value in this case is greater than the intrinsic accuracy. 

(Pitman, 1936.) 
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17.12. If th.e frequency function of an estimator tis0 its accuracy is JSJ ^ ^ 


)■}• 


If every possible sample with, frequency gave a different value of t the accuracy would 

and would be independent of t. Show that the difference in accuracy^ 

1 dcf> 


1 


EU 


may be expressed as 

^cl>dd 0 j 

and hence is not negative. 

Hence show that the efficiency as defined in 17.36 cannot exceed unity, at least if the 
range is independent of 0. 

(Fisher, 19256.) 


17.13. Show that 


dF = ± 


0o dx 


00 < a: < 00 


jr 0| + (aj — 

does not admit of a sufficient estimator for either parameter if the other is known, or 
a pair of jointly sufficient estimators if both are unknown. 

(Koopman, 1936.) 


17.14. Show that if a distribution admits a sufficient estimator for either of two 
parameters when the other is known, it admits of a pair of jointly sufficient estimators 
when both parameters are unknown. 

(Koopman, 1936.) 

17.15. Show that the centre of location of the Type IV distribution 

dF oc e~’’ -fl + ( — - ^ \ \ dx, — oo < x < oo 


/5 




where v and p are assumed known, is distant to the left of the mode of the distribution. 

p + •* 

(Fisher, 1921a.) 


17.16. For the distribution 

dx 


dF 


0, 


01 


X < 01 


0‘! 


show that, in large samples, the mean tends to the form 


dF 


02 \] 


6w. 


7t 


exp 




dx. 


Show further that the distribution of the centre of the sample, say c (the mean of the two 
extreme values), tends to 


n 


dF = g- exp 


var c 


2n 

02 

6 


dc. 


Hence 

var X n 

so that the centre is a far better estimator of location than the mean for this distribution . 

(Fisher, 1921a.) 
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17.17. Show that for the Type I distribution 

== rTT- — ^ (1 — xY-~^ dx, 0 < ic < 1 

-5 {'p, q) 

the geometric mean of the sample values x and that of the values (1 — x) are jointly 
sufficient for the estimation of p and q. 

17.18. Show that all the Pearson distributions have sufficient estimators for some 
of the parameters if the others are assumed known, and ascertain which are the parameters 
concerned for each type. 

17.19. Eor the distribution of Exercise 17.15 show that the intrinsic accuracy for a is 

1 (/> + 1) (p + 2) {/3 4~ 4) 

^2 (p + 4)2 -t- ’ 

and that the efficiency of the method of moments in locating the curve is 

(jO — 1) { (p + 4)2 + y2 I 
{p + 1) (p + 2) (/> 4- 4) (p2 + .,2)- 

(Fisher, 1921a.) 


A.S. — 11 
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ESTIMATION: MISCELLANEOUS METHODS 

Minimum Variance 

18.1. We have seen in the previous chapter that under certain general conditions 
the maximum lihelihood estimator is most-efficient for large samples, and that for finite 
samples it leads to sufficient estimators where such exist. Sufficient estimators themselves 
contain all the information in the sample about the parameter under estimate. What 
we have not shown, however, is that maximum likelihood estimators have minimum variance 
in finite samples. 

We now consider the subject from a slightly different standpoint. Instead of begin- 
ning with the criteria of efficiency and sufficiency and showing that they lead to certain 
minimal properties, we shall examine the class of estimators which (a) are unbiassed and 
(6) have TYiirii'Trmm variance. The minimal property is here taken as the starting-point. 

18.2. Consider, then, a frequency function f {x, d), and as usual let us write 


L =f{xx, 6) .. . f{x^, 6). Then, writing J dx for the 7i-fold integral over the range 
of the ris, we have to find t = t {xi, . . . x.^^) such that 


^00 

t Ldx = 6 . 

J —00 

. (18.1) 

1 (t — Oy Ldx = minimum. 

J —00 

. (18.2) 

The first equation may also be written 


^OO 

{t — 6) L dx = 0. 

J — 00 

. (18.3) 

The problem of finding t is one of the familiar problems in the Calculus of Variations. The 
minimal value of (18.2) has to be found subject to the condition (18.1), which is equivalent to 

- ■ ■ 

. (18.4) 


provided that the range of / is independent of 6 or that / vanishes at any extreme which 
depends on 0. 

If 2A is an unspecified parameter (which may depend on 6 but not on the a:’s) the 
problem is equivalent to finding an unconditioned Tm’m'Trmm of 

j“ - er L - 2M dx (18.6). 

The solution is* 

* See, for example, J. Edwards, Integral Calculus, vol. 2, article 1504, or A. R. Forsyth, Calculus 

of Varmtions, article 15. Since the expression to he minimised does not contain the Euler equation 

occ 

for a stationary value to the integral ^ V dx reduces to ^ = 0. The derivation of (18.7) is not. 
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or 

We then have 


^ ^dL 


i = 0 + 


A ?A 

LdO 


. (18.6) 


= e+X^-^^, (18.7) 

where t is a function of the a;’s but not of 0. Thus there exists a t satisfying our conditions 
if we can express ^ form 

= Lzl. . . ' . . . (18.8) 

00 A 


This is a necessary and sufdcient condition, except that it gives only stationary values of 
(18.2) which might, for instance, he maxima instead of minima. This is not a point, 
however, which need detain us from the statistical viewpoint, troublesome as it is to the 
mathematician. 


Exomjple 18.1 

To estimate 0 in the normal 

dF = exp I 

a-s/{27i) t 

where a is assumed known. 

We have 


population 



0)2 I dx, 


d log L 
00 “ 


n 


{x — 0). 


— 00 < JC < 00 


This can be put in the form (18.8) by taking 

.f — t and 1 

n 

and hence x is the required estimator. We note that it has minimum variance for any 
n in the class of unbiassed estimators of d. 


Example 18.2 

To estimate 0 in 


dF 


1 dx 

71 1 + (a; — 0)2’ 


— 00 < a* < 00. 


We have 

0 log L _ ^ „ r X — d \ 

00 "‘^^\n-(a:~0)2j- 

This cannot be put in the form (18.8) and the method fails. There is no estimator which 
is unbiassed and has minimum variance. 


however, without its difficulties, and I think some conditions have been accidentally suppressed in 
the Aitken-Silverstone method. I understand that Dr. Leon Solomon, working with Dr. Aitken, has 
obtained a proof which depends on the fact that L shall be the product of n independent frequency 
functions. But for the war the point would doubtless have been cleared up by now, but at present 
it remains open. 
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18.3. Integrating (18.8) with, respect to 6 we have 

iogi:: = a(0)(«-0) + /3(e) 

?■ 

where a, /?, y are arbitrary functions (apart from the fact that the two former depend on 
X). Hence 

log/ (a:, 6) = A (0) {t ~ 6) + B {6) C (x) 

= p (0) t (a;) + q{0) + r (x), say. . . . (18.9) 

Comparing this with (17.83), we see that the method of minimum variance will give a 
solution only if there exists a sufficient estimator. This explains the success of the method 
in Example 18.1 (where x is sufficient) and its failure in Example 18.2 (where no sufficient 
estimator exists). 


18.4. In the method of maximum likehhood it makes no difference to the final 
result whether we estimate for a parameter 0 or for some other parameter % functionally 
related to 0. For 

d log L _d log L dx 
90 dx 30 

and the two sides of the equation vanish together. In the method of minimum variance, 
however, there is an interesting difference. 

Suppose we wish to estimate 0 in 


We have 


dF 


V(27t0) 


exp 


1 

2T 


dx. 


00 


X 


00 . 


0 log L 


n 


■+ 


1 i: (a: 2) 

2 02 ’ 


and this may be put in the form (18.8) with 

t = 12’ (a:2) and 
n 


n 


If, however, we consider the parallel problem of estimating a in 


dF 


1 


a ■\/{27 i) 


exp 


1 x"^' 


dx, 


00 < a; < 00 


we find 


9 log L 


n 

a 


-f- 


(a:2) 


which cannot be put in the form (18.8). We thus reach the peculiar result that the method 
will provide an estimator for but not for a. It follows that in general we may have 
to estimate, not 0 itself, but some function of 0, say t (0). 


18.5. If a minimum-variance estimator exists for some t (0) we must have 

0 log L t — T 

9t 1 (t) ’ 

which is equivalent to 

Sr , _ . 

0 log 2 90^ 

00 


A(0) ' 


. (18.10) 
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We estimate t by putting it equal to r and thus we shall have, for the estimator, 

=0 { 18 - 11 ) 

V dd Jt^r 

This is equivalent to the equation of maximum hkehhood. The two are not, however, 
identical. Maximum likelihood is not concerned with the existence of the function 
Minimum variance takes the function as fundamental, and when it exists the solution 
(which is the same as the maximum likelihood solution) has minimum variance for all n 
in the class of unbiassed estimators, not merely for large n. 

18.6. Let us suppose that 6 is the parameter (transformed if necessary) for which 
the estimating function is 6 itself. Then we have for the minimum-variance estimator t 

var ^ = f {t — 6)^ L dx, 


which, on substitution from (18.8), yields 


var 




. (18.12) 
. (18.13) 


if the range is independent of 6 or / vanishes at any extreme dependent on 6. 
Now from (18.8) we find 

92 log L 


dd‘ 


{t-e) 


deyxj 


and hence, substituting in (18.13) and remembering that f {t — 6) L dx — 0, we find 

^ — 00 

var t = — ^ Li dx 

_ (18.14) 

The variance of the minimum-variance estimator is thus simply the parameter L It also 
follows from (18.13) that 

J_ _ r 

var t J -00 \ / 


n E 


. (18.15) 


a^iog/x 

V ’ 

so that the result we reached in Chapter 17, as a limiting form for large n, is now seen to 
be exact for finite n under present conditions. 


Example 18.3 

To estimate 6 in the Type III form 

dF = dx, 0 < a: < CO, p > 1, 

r ip) 0" 

where p is assumed known. 
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We have 

which is of the form (18.8) if 
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9 log L 


t = 


dd 


X 


and 


np nx 


A =11 


Tip 


02 


Thus t is the mimmuin-variance estimator and has variance — for finite ri, even though 

np 

the distribution is not normal. (Compare Example 17.8.) 

18.7. We may readily determine what function r (0) should be taken as the estimating 
function. Taking the general form from (18.9), 

log / {x, 0) = p (0) t{x) (0) + r (x), 


we have 


log L = p Zt{x) nq U r [x) 


dr 


dr 


dr 


Hence, if 

we have 




T = 


dp 


dqjdjp 

00 / 00 


0 log L _n 
dr 


Z{t) 


1/n 


dp 


which is of the required form provided that 


1 dp 


Example 18.4 

Consider again the estimation of a in 

1 / 1 a;2^ 


dF 


Here 


whence 


^/{27tO^) 

log/ 

p {a) = 


exp 


2 ah 


dx. 


^ log {27t) - logo - I—, 


— 00 <a: < 00 . 


X‘ 




t [x) — x^, q = — log a. 


Thus the appropriate value of r, from (18.17), is 

T = - 

daj da 


(18.16) 

(18.17) 

(18.18) 


(18.19) 


2 



5,5 


MINIMUM 


which is thus determined as our estimating function, 
of T we have 


dp 2cr* 

1/n ^ = — , 
dr n 


the estimator itself being - (a;®). 


For the variance of the estimator 


Minimum 

18.8. We now turn to consider another principle which has been suggested for pro- 
viding estimators. If the data are grouped into cells with expected frequency typified 
by and observed frequency by Ip then the function 



where n = S (A^) = S (Ij) 


. (18.20) 


. (18.21) 


can, as we saw in Chapter 12, be used as a measure of closeness of fit. The method of 
minimum adopts this standpoint (which is, of course, arbitrary in the logical sense) 
and attempts to determine the parameters A such that jg a, minimum. 

In practice the method is not very easy to apply because of the difficulty of expressing 
the A’s in terms of the parameter under estimate, B. For some illustrations reference 
may be made to Kirstine Smith (1916). We shall not consider the method at length 
here for two reasons : — 


(а) it may be shown that for large samples the minimum- estimator tends to 
the maximum-likelihood estimator ; 

(б) there is a modification of the method, considered below, which is much easier 
to apply. 


18.9. For samples of fixed size n the distribution of the quantities 1^ is multinomial, 
and we have for the likelihood function 



Thus 

log L — constant -t- E Ij log 
Now for large samples we may put 



. (18.22) 


. (18.23) 


Xj — Ij-i- aj n^, 

where cq is finite and therefore small compared with Ij ; ] # | < ; and E (a^) = 0. 
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Hence, from (18.23), 


log L = + NZ,. log ^ 1 + 


= ]C _ 4- 0 (w-i) 


Now write 


= k-^Z li—M! + 0 {n-i). 


- hy 


(18.24) 


^2 — 2J 


U :± — n. 


(18.25) 


Then we see that, to order n~^, L is maximised by minimising x'^. This latter quantity 
is not the same as x^ because the denominator terms are Z’s instead of A’s. However, for 
large n the difference is of order nr^, for 


h h 


,4\-l 


{h~~ 


y, {Xj — i I 

£\J a,- # + . . . 

If 


= 0 (n-^). 

Hence, to order n~^ the estimates obtained by minimising either x^ ov x^ will 1^® equivalent 
to maximising L. 


18.10. The advantage of using x'^ instead of x^ in practice resides in the fact that 
the denominators in the former are integral. However, if there are any empty cells (i.e. 
those for which Zj = 0) the formula (18.25) requires some modification. 

= 1 for all Xj. The substitution 

Xj = If + 

will give us, for the empty cells, a term in (18.24) equal to — Za^n^ = — ZXj — M, 
say. Hence we have 

^^2 + 2M, .... (18.26) 

h 

where the summation takes place over occupied cells and M is the sum of the theoretical 
frequencies X in the empty cells. 


In the likelihood function, if L = 0, 


Example 18.5 

As an example (Jeffreys, 1941) we consider a case where the maximum likelihood 
estimator is known, so that a comparison may be made with the result given by 
minimum x'^. 

Col. (2) of the following table shows the frequency of women in the first class of Part II 
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of the Mathematical Tripos from 1910 to 1938 inclusive. 


follows the Poisson distribution 


d’ 


4 1 


, to estimate 0. 


Assuming that this distribution 


(1) 

(2) 


( 3 ) 

Number of 

Frequency 

Ij 


A,- 

firsts, j 

0=1- 

6 = 1'5 

... ' ' 1 

0 

6 

10-7 

6*5 

1 

8 

10-7 

9-7 

2 

11 

5-3 

7-3 

3 

3 

1-8 

3-6 

4 

0 

0-5 

1-4 

5 

1 

0-1 

0-4 

over 5 

0 

0-0 

01 

Totals 

29 





i 

0 = 2 

'0 = 1 

! 

0 = 1-5 

II 

3-9 

3-7 

0-0 

0-7 

7-9 

0-9 

0-4 

0-0 

7-9 

3-0 

1-2 

0-9 

5*2 

0-6 

01 

1-6 

2-6 

— 

— 

— 

1-0 

0-8 

0-4 

0-0 

0-5 

o 

1— f 

11 

2M = 3-0 

2M = 6-2 


9-9 

5-1 

9-4 


The sample mean (a sufficient estimator of 0) is in this case 44/29 
lx 


1-52 with a standard 


error 


0-23. 


To apply minimum we have to express the theoretical frequencies in terms of 0. 
This results in an unmanageable equation if we then substitute in Instead we cal- 
culate the minimum by finding for some trial values of 0 (in this case 1, 1‘5 and 2) and 
then interpolating. 

The expectations A for the three selected values of 0 are shown in column (3) of the 
table and the corresponding x'^ column (4). It is found that, writing 0 = 1*5 -j- 
the values of x'^ bo represented by the quadratic 


X'^ = 5-1 - + 18-2(j>K 

The minimum of this is given by ^ = 0*01, and hence our estimate of 0 is 1-51, very close 
to the value of 1’52 given by the maximum likelihood estimator. 


18.11. On theoretical grounds there seems no reason to use minimum x^ instead of 
maximum likelihood. The method has some practical value, however, where the maxi- 
mum likehhood equations are difficult to solve. We can usually follow the device of the 
example just given, find x^ fo^ some trial values of the parameter, and approximate 

to the value which minimises x^ x'^- Whether this is easier than finding the maximum 
likelihood estimate in the same sort of way depends on the circumstances of the case, but 
it may well be so when the frequency function is a tabulated integral, so that expected 
frequencies for specified parameter-values can be readily obtained. 


18.12. In the manner of 17.39 we can estimate the loss of information occasioned 
by the use of minimum x^‘ We have, for the minimum of x^> 


^ (Z-A)2 

00 X 


= 0 , 
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which reduces to 


dX 

A® dd 


. ( 18 . 27 ) 


Since 


1 X 


tends to the constant value 2 for large samples, this is equivalent to th© 


maximum likelihood equation 


yl~XdX_ 
X de 


. ( 18 . 28 ) 


confirming that maximum likelihood and minimum give the same results in the limit - 
Since 

P-X^ = 2X{l-X) + {l-XY 

the deviation of ^ from its mean is 

od 


^ -X^dX 1 

^ A2 ~dd ^ 12 dd’ 


( 18 . 29 ) 


the first term vanishing on summation. As in 17.39 we find the variance of this quantity 
within samples for which is constant. We have 


Ya,rZk{l - A)2 = 2 E {Ic^X^) --E^ {kX'^) - 2 

n 


and on substituting k — 



we find 


E^ (H'2) 




. ( 18 . 30 ) 


giving the loss of .information. 

As the sample size increases, this quantity remains finite. It is interesting to observe, 
however, that as the number of classes increases it also increases without limit, indicating 
that minimum breaks down for fine grouping. 


Inverse ” Prohobility 

18.13. According to Bayes’ theorem (7.24), if h (0) dB is the prior probability of 0, 
the posterior probability is given by 

P (0 1 . . . ajJ = L (iCi, . . . x^,e)h (0) dB . . . (18.31 ) 

It is then easy to determine the “ most probable ” value of 0 by maximising Lh (6) if wo 
know h (0). The principles of inference with which we have been concerned up to tho 
present do not require the notion of the probability of 0 and, even if they did, would no-fc 
give any guide to the nature of the function h {$). In fact, to an adherent of the frequency- 
theory of probability, the prior probability of 0 requires the distribution of 0 in some form., 
and if 0 is merely an unknown constant it has no distribution (except the trivial one thsb-fc 
/ — 1 when 0 takes its true value and / = 0 elsewhere). The alternative school of thoughi-fc 
assumes the existence of h (0) as denoting a prior measure of belief, but, in order to fin.c3L 
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the most probable value of 0, has to make some further assumption as to its values com- 
parable to Bayes’ postulate that for a finite range A is a constant. 

We have already noted that on this assumption the maximisation of L is equivalent 
to finding the value of 6 with the greatest posterior probability. It is also interesting to 
note that, whatever the form of h (6), maximum likelihood tends to give the same estimator 
as the method of maximising posterior probability for large n. In fact, for the maximisation 
of P in (18.31) we have 


d log P_51ogL 9 log A 

___ _ [- 


. (18.32) 


In ordinary cases the variance of 


d log L 
dQ 


is of order whereas the second term is inde- 


pendent of In the limit, therefore, the second term is negligible and we are reduced to 
the likelihood equation 

SlogL ^ 
dd 


Least Squares 

18.14. The method of least squares bears an analogy to minimum Suppose 
we have an expression depending on a number of unknown parameters 0i . . . 0^ and 
certain observed values x. This can be thrown into a form such as 

h (a;, 01 . . . 0p) = 0, . . . . (18.33) 

where k is a given function (not a frequency function). If we have n values of x and n> p 
it is not possible to solve the n resulting equations of type (18.33) for the 0’s. We then 
consider the “ residuals ” k (xp 0i . . . 0^,), and the principle of least squares states that 
the values of 0i ... 0^ are to be chosen so that 

Z {k {xp 01 . . . 0,,) = minimum, . . . (18.34) 

or, ill other words, so as to satisfy the p equations 

“ (%> 0. ■••«„)}= 0. l = l . . .p. . . (18.35) 

/ 


18.15. Consider the case when the residuals are all distributed normally with variance 
or^. The logarithm of the likelihood is then (except for constants) — 

log L — n log a — -- Z k^ {xp 0i . . . Op) . . . (18.36) 

2a^ ■' 

and this is clearly maximised by minimising the sum (18,34). In this case, then, the method 
of least squares is equivalent to the method of maximum likelihood. In other cases it 
may give different results, and the justification for using it then becomes more or less 
empirical. 

18.16. The most important case occurring in statistical theory of the use of the 
method of least squares concerns regression equations. We have already seen that the 
coefficients of regression are, in effect, determined so as to minimise the sum of squares of 
residuals (cf. 15.2). We also know that, for the multiple normal distribution, residuals 
from the population regression, lines are, in fact, normally distributed (15,13). For normal 
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variation, therefore, the method of least squares is equivalent to maximum likelihood so 
far as concerns the simultaneous estimation of regression coefficients. 

18.17. This is a convenient point to prove a theorem (due to Gauss) which in one 
form or another is constantly occurring in statistical theory, particularly in connection 
with the normal distribution. Suppose we have a population (not necessarily normal) 
in which the regression of one variate y on the others Xo ( = 1), . . . , is given by 

y = ^iX-i ^p' • • ■ (18.37) 

The a;’s may be correlated among themselves and, in the extreme case, functionally related, 
so that this case includes that of curvilinear regression for our present purposes. Suppose 
that we have a sample of n values, where n> Denoting by E summation over these 
n values, we determine the estimates of the ^’s by minimising the sum of squares, e.g. 

E {y — ^0 — — . . . — 

Suppose that bo . - > bp are the solutions of this process. Then our regression formula is 

y — bo — bx Xi — ... — bp Xp = 0. . . . (18.38) 

The observed residuals, obtained by substituting the observed values in this equation, 
are typified by 

e — y — bo — bx Xx ... — bp Xp, . . . (18.39) 

whereas the “ real ” residuals are typified by 

e = y — ^0 -- Xx ... — Pp Xp. . . . (18.40) 

We proceed to compare the sampling variances of e and e and to show that 

var £ = - — — var e, .... (18.41) 

n — p ~ 1 

provided that the residuals are uncorrelated. 

Let us transform the observed values of the a;’s to new values io, . ip for 

each) such that 

v {x^ h) = 1 j = h] 

== 0 j^h\ (18.42) 

^ y) ^ K 

This involves, for each p + 1 equations in n unknowns and is therefore possible in general. 
We then have 



— ^ iki^' — €,) — E { (^0 — bo) + {^1 — bi) Xi . . . 

— ^ic bj^. 

Wp - 

-bp)Xp} 

But 

E ifc c — E (i* 7 j. y) — E { 6o + + • • • bp 

~ bji ~ bj^ = Q. 

Xp] 


Hence 

1 

1! 

1 


. (18.43) 

Now — 

E e (e — e) — E 6o • • • bp Xp^ { (^o bo) 

• =0, 

• . . 

i^p bp) Xp j- 


since the summations give terms the vanishing of which determines the 6’s. Hence 

Ee^ — Ee^ = E (e — e) s 

^S{bj-i}^)Ex.je, 

1 
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where S denotes summation over the {p -b 1) values of j, 

= 8 U e U Xj e 

= 8 {H e^} cross-product terms in s, 

= 8 cross-product terms. 

When we take expectations the cross-product terms vanish since the residuals are unoorre- 
lated. Hence 

E{Ze^) -E{8 e^) == E Z 

or (^ — P — 1) s = n var e, ... (18.44) 

from which (18.41) follows at once. 

For normal variation we shall consider this result from a slightly different viewpoint 
in Chapter 22. 


NOTES AND REFERENCES 

The approach to minimum -variance estimators through the Calculus of Variations is 
due to Aitken and Silverstone (1942). For minimum see K. Smith (1916) and R. A. 
Fisher (1922a, 19256). For the modification x'^ s®® Jeffreys (19386, 19396, 1941). 

A method of estimation essentially depending on the median has been proposed for 
use in quality control, hut its value is as yet problematical. For an account of the technique 
see Simon (1941). 


EXERCISES 

18.1. From the property that the variance of a minimum -variance estimator is 
equal to I show that the most general distribution for which the sample mean is a sufficient 
estimator is 


/ {;x, 0) = c {x, cr) exp 




where c is an arbitrary function and is the variance of /. 

Hence show that no Pearson curve other than the normal admits the sample -mean 
as a sufficient estimator, but that a Gram-Charlier series may do so. 

(Aitken and Silverstone, 1942.) 


18.2. If the function X exists and 

a (6) 


r 


X {oy 


show that the variance of the estimator t is 

1 d-q 
n 9a^’ 

where q is the function of 18.7. 


(Aitken and Silverstone, 1942.) 


18.3. If a population {p + ^)^ is regarded as distributed in 5 classes, show that the 
intrinsic accuracy is — . Show further that the loss of information through estimating 
p from minimum is 

(3p* - 2 m + Sq^) - -2p=q + q‘ - + 2 *)'. 

pi qi 2p^ 

This is least when p — q and is then equivalent to the loss of 5 observations. 

(Fisher, 19256.) 



CHAPTEE 19 

CONFIDENCE INTERVALS 

19 . 1 . In tlie previous two chapters we have been concerned with metliods wliich 
wiU provide an estimate of the value of one or more unknown parameters ; and tlie nu^tliods 
gave functions of the sample values — ^the estimators — ^which, for any given sample, pro- 
vided a imique estimate. It was of course fully recognised that the estimate iuiglit differ 
from the parameter in any particular case, and hence that there was a margin of uncer- 
tainty. The extent of this uncertainty was expressed in terms of the sam j)ling vjirianco 
of the estimator. With the somewhat intuitional approach which has served our jmrjjose 
up to this point, we say that it is probable that 6 lies in the range ± V var t, wry pi-obable 
that it lies in the range ^ ± 2-\/ var t, and so on. In short, what we have done is in (.iffect 
to locate 0 in a range and not at a particular point, although we have regardt'd ont^ point 
in the range, viz. t itself, as having a claim to be considered as the “ best ” estimat e of 0. 

19 . 2 . In the present chapter we shall examine the logic of this ]>ro(‘t'dur-t‘ more 
closely and look at the problem of estimation from a different point of vit^w. W(^ now 
abandon attempts to estimate 6 by a function which, for a specified sample, giv{\s a. unicpio 
number. Instead we shall consider merely the specification of a range in which 0 lies. 
We shall not attempt to specify whereabouts in the interval the value of 0 i'(‘allv is ; all 
values in the range have an equal claim to be taken as the “ true ” value. .\or slia.ll wi) 
assess the probability that 6 lies in the interval in the sense that 0 is regarded as a. random 
variable. ^ In fact, in the frequency theory of probability 0 is not a random variable ((vxc(‘pt 
trivially in that the frequency of B is unity when it takes the true value a.nd is z(‘r<> (dse- 
where). Nevertheless, probability plays an essential part in the determination of the 
interval and in the degree of confidence we have that it “ covers ” (9. 

Case of one Unknown Parameter 

19 . 3 . Consider in the first place a population dependent on a single unknovvm [)ara“ 

meter 6 and suppose that we are given a random sample of n values from the 

population. Let 2 be a statistic dependent on the aj’s and on 0, whose sampling dista-ibuiion 
IS mdependent of d. (The examples given below will show that in some cases at, l(>ast; such 
a statistic may be found.) Then, given any probability a, we can find a value 2 , smdi that 

f dF (z) — a, 

J —00 

shall^then have notation of the theory of probability we 

P(2<2,|0)=a 

It may happen that the inequahty z < 2 ^ can be transformed to the form 0 ' L or 

B > ti, where is some function depending on the value and the .r’s l)ut not on 0 h’or 
mstance, if z = x - B we shall have 

X — B < 2i 

and hence B > x ~ z„ 

If this transformation can be made we then have, from (19.1), 

P{6 <f,\B)= a. 

62 


(19.2) 
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More generally, suppose that we can find a function depending on a and the ic’s 
but not on 6, such that (19.2) is true for all d. Then we may use this equation in probahihty 
to make certain statements about d. 

19.4. Note, in the first place, that we cannot assert that the probability is a that 
6 does not exceed a constant This statement (in the frequency theory of probability) 
can only relate to the variation of 0 in a population of 0’s, and in general we do not know 
that 0 varies at all. If it is merely an unknown constant then the probability that 6 <,ti 
is either unity or zero. We do not know which of these values is correct, but we do know 
that one of them is correct. 

We therefore look at the matter in another way. Although 0 is not a random variable, 
ti is and will vary from sample to sample. Consequently, if we assert that 0 < in each 
case presented for decision, we shall be right in a proportion a of the cases in the long run. 
The statement that the probability of 0 is less than or equal to some assigned value 
has no meaning except in the trivial sense already mentioned ; but the statement that 
a statistic tx is greater than or equal to 0 (whatever 0 happens to be) has a definite proba- 
bility a of being correct. If therefore we make it a rule to assert the inequality 0 <«i 
for any sample values which arise, we have the assurance of being right in a proportion 
a of the cases “ on the average” or “in the long run.” 

This idea is basic to the theory of confidence intervals which we proceed to develop, 
and the reader should satisfy himself that he has grasped it. 

19.5. To simplify the exposition we have considered only a single quantity q and 
the statement that 0 < tx- In practice, however, we usually seek for two quantities 
and tx, such that 

P {to <d <tx\d} = a, . . . . . (19.3) 

and make the assertion that 0 lies in the range to to tx. These quantities are known as the 
Lower and Upper Confidence Limits respectively. They depend only on a and the sample 
values. For any fixed oc the totality of values of to and tx for different samples determine 
a field within which 0 is asserted to lie. This field is called the Confidence Belt or Region 
of Acceptance. We shall give a graphical representation of the idea below. The number 
a is called the Confidence Coefficient. 


Example 19.1 

Suppose we have a sample of n from the normal population with unit variance 


dF = — — - exp {— ^ {x — i.iY]dx, — oo < a; < oo. 

The distribution of means x will be 


dF = 





dx. 


00 < a; 


00 . 


From the tables of the normal integral we know that the probability of a positive deviation 
from the mean not greater than twice the standard deviation is 0-97725. We have 
then — 


P 


X — II 



1 ^ 


0-97725, 
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which is equiTalent to 


P\ X 


■\/n 


= 0-97725. 


Thus, if we assert that {jl is greater than or equal to x — 2j'\/nwe shall he right in about 
97-725 per cent, of the cases. 

Similarly we have 

2.1 _ r _ 2 ' 


P \ X ~ [JL '> 


■\/n 


fj.. 


P{ ^ <x -\- 


■\/n 




0-97725. 


Hence, combining the two results, 

^r_ 2 


X T- < a < ^ H 7- 


fi) = 2 (0-97725) - 1 = 0-9545. 


Hence, if we assert that lies in the range x ± 2/’\/w we shall be right in about 95-45 j)er 
eent. of the cases in the long run. 

Conversely, given the confidence coefficient we can easily find from the tables of the 

normal integral the deviation such thatP < /^ < x + ~ instance, 

if a = 0-8, d = 1-28, so that if we assert that [x lies in the range x i l-28/-\/n the odds 
are 4 to 1 that we shall be right. 

The reader to whom this approach is new will probably ask : but is this not a round- 
about way of using the standard error to set limits to an estimate of the mean ? In a 
way, it is. In effect, what we have done in this example is to show how the use of t he-! 
standard error of the mean in normal samples may be justified on logical grounds without 
appeal to new principles of inference other than those incorporated in the theory of })roha,- 
bility itself. In particular we make no use of Bayes’ postulate. 

Another point of interest in this example is that the upper and lower confidence limits 
derived above are equidistant from the mean x. This is not by any means n(‘.ccssaj-\'. 
and it is easy to see that we can derive any number of alternative limits for the sanu' con- 
fidence coefficient a. Suppose, for instance, we take a = 0-9545, and select two numbers 
ao and aj, which obey the condition 


say ao 


(ao + oci - 1) = 0-9545, 

0-9645 and ai = 0-99. From the tables of the normal integral we have- 


X 


< 


2-326 


'\/n 


IX 




1-806 


u 


0-99 


0-9645, 


and hence 


P\ X 


2-326 

's/n 


< jU < X -j- 


^06 

s/n 


fx I 


0-9545. 


Thus, with the same confidence coefficient we can assert that [x lies in the range x — 2/ -x/ m 
to X -f 2/Vn, or in the range x - 2-326/V% to x + 1-806/Vw. In either case we shall ho 
right in about 95-45 per cent, of the cases. 

We note that in the first case the range is 4/Vw units and in the second case it in 
A'Y^2 f -s/n units. Other things being equal, we should choose the first set of limits sincx? 
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they locate the parameter in a narrower range. We shall consider this point in more 
detan below. It does not always happen that there is an infinity of possible confidence 
limits or, if there is, that any simple rule of choice between them can be formulated. 

Graphical Representation 

19.6. In a number of simple cases, including that of the previous example, the con- 
fidence limits can be represented in a useful graphical form. We take two orthogonal 
axes, OX relating to the observed x and OY to p (see Pig. 19.1). 


Y 



The two straight lines shown have as their equations 

ju — X ^ — X 2. 

Consequently, for any point between the lines, 

X — 2 [X -C X -h 2. 

Hence if for any observed x we read off the two ordinates on the lines corresponding to 
that v’alue we obtain the two confidence limits. The vertical interval between the limits 
is the confidence range (shown in the diagram for x = 1), and the total zone between the 
lines is the confidence belt. We may refer to the two lines as the Upper and Lower 
Confidence lines respectively. 

This example relates to the somewhat trivial case n = 1. For different values ot n 
there will be different confidence lines, all parallel to p = x. They may be shown on a 
single diagram for selected values of n, and a figure so constructed provides a useful method 
of reading off confidence limits in practical work. ^ 

A.S. — VOL. ii. 
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Central and Non-central Intervals 

19.7. In Example 19.1 the sampling distribution on which the confidence intervals 

were based was symmetrical, and hence, by taking equal deviations from the mean, we 
reached equal areas of the frequency function as oto and ai. In general we cannot achieve 
this result with equal deviations, and subject always to the condition ao + — 1 = a 

the two quantities may be chosen arbitrarily. 

If ao and oci are taken to be equal, we shall say that the intervals are central. In such 
a case we have 

<0) =P(0 <i,) = 5-^ (19.4) 

In the contrary case the intervals will be called non-central. 

19.8. In the absence of other considerations it is usually convenient to employ 
central intervals, but circumstances sometimes arise in which non-central intervals are 
more serviceable. Suppose, for instance, we are estimating the proportion of some drug 
in a medicinal preparation and the drug is toxic in large doses. We must then clearly 
err on the safe side, an excess of the true value over our estimate being more' serious than 
a deficiency. In such a case we might prefer to take a.i very near to unity or even equal 
to unity, so that 

P {0 < t,) = 1 
P (in < 0) = a, 

and we are certain that 0 is not greater than 

Again, if we are estimating the proportion of viable seed in a sample of material that 
is to be placed on the market, we are more concerned with the accuracy of the lower limit 
than that of the upper limit, for a deficiency of germination is more serious tlian an excess 
from the grower’s point of view. In such circumstances we should prol)ably take as 
large as conveniently possible so as to be nearer to certainty about the minimum vahie 
of viability. This kind of situation often arises in the specification of the (juality of a 
manufactured product, the seller wishing to guarantee a minimum standard l)ut l)eing 
much less concerned with whether his product exceeds ex[)ectation. 

19.9. On a somewhat similar point, it may be remarked that in certain circum- 
stances it is enough to know that P {to - i 0 < h | } exceeds some quantity a. We then 

know that in asserting 6 to lie in the range t„ to h we shall be right in at least a proportion 
a of the cases. Mathematical difficulties in ascertaining confidence limits exactly for* 
given a, or theoretical difficulties when the distiibution is discontinuous may, foi- example, 
lead us to be content with the inequality rather than the ecjuality of (Ih.k). 

Example 19.2 

To find confidence intervals for tlie parent proportion m of successes in samjjling for 
attributes. 

In samples of n the distribution of successes is given by the binomial {% -f- to)". We 
will determine the limits for the case n = 20 and confidence coefficient 0-95. 

We require in the first instance the distribution function of the binomial, which is 
obtainable from Table 5.2 (vol. I, p, 119). Summing the number of successes and dividing 
by 10,000, we find from that table the following : — 
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Proportion of 



Successes 

W = 0-1 

W == 0-2 

p 



0-00 

0-1216 

0-0115 

0-05 

0-3918 

0-0691 

0-10 

0-6770 

0-2060 

0-15 

0-8671 

0-4114 

0-20 

0-9569 

0-6296 

0-25 

0-9888 

0-8042 

0-30 

0-9977 

0-9133 

0-35 

0*9997 

0-9678 

0-40 

1*0001 

0-9900 

0-45 

1*0002 

0-9974 

0-50 

— 

0-9994 

0-55 

— 

0-9999 

0-60 


1-0000 

0-05 

— 

— 

0-70 

, — 

' 

0-75 

— 

— 

0-80 



0-85 



0-90 

— 


0-95 




w = 0-3 

W 0-4 

W = 0-5 

0-0008 


, 

0-0076 

0-0005 

- 

0-0354 

0-0036 

0-0002 

0-1070 

0-0159 

0-0013 

0-2374 

0-0509 

0-0059 

0-4163 

0-1255 

0-0207 

0-6079 

0-2499 

0-0577 

0-7722 

0-4158 

0-1316 

0-8866 

0-5955 

0-2517 

0-9520 

0-7552 

0-4119 

0-9828 

0-8723 

0-5881 

0-9948 

0-9433 

0-7483 

0-9987 

0-9788 

0-8684 

0-9997 

0-9934 

0-9423 

0-9999 

0-9983 

0-9793 



0-9996 

0-9941 


0-9999 

0-9987 


1 

0-9998 

— 

— 

1-ooop 


— 

— 


The final figures may be a unit or two in error owing to rounding up, but that need 
not bother us to the degree of approximation here considered. Values for ro = 0-6 to 0’9 
may be obtained by symmetry. 

We note in the first place that the variate p is discontinuous. On the other hand 
we are prepared to consider any value of w in the range 0 to 1. For given ztj we cannot 
in general find limits to p for which a is exactly 0-95 ; hut we will take p to be the nearest 
multiple of 0'05 which gives confidence coefficients at least equal to 0-95, so as to be on 
the safe side. We will consider only central intervals, so that for given zn we have to find 
pf, and Pi such that 


P {m > Pol > 0-975 
P (ro <pi} > 0-975, 


the inequalities for P being as near to equality as we can make them. 

Consider the diagrammatic representation of the type shown in Fig. 19.1 and given 
for our present case in Fig. 19.2. 

From the table we can find, for any assigned zn, the values tuq and zoi such that 
P (p Wo) > 0-975 and P {p Wi) > 0-975. Note that in determining Wi the distribution 
function gives the probability of obtaining a proportion p or less successes, so that the 
complement of the function gives the probability of a proportion 1 — p — • 0-05 or less 
(not 1 — p). Here, for example, on the horizontal through, w = 0-1 we find Wo - 0 and 
t^i == 0-30 from our table ; and for w 0-4 we have Wo 0-15 and Wj, 0-65. The points 
so obtained lie on stepped curves which have been drawn in. The zone between them is 
the confidence belt. For any p the probability that we shall be wrong in locating w inside 
the belt is at the most 0-05. We determine p„ and p, by drawing a vertical at the given 
value of p on the abscissa and reading off the values where it intersects the curves. That 
these are, in fact, the required limits will be shown in a moment. 
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We could Lave found more ■ precise confidence limits by interpolating in tiu* fabli'- 
obtained above. For example, with p = 0*30 we see that 


for w = 0-1, P = 0-9977 
for ui = 0-2, P = 0-9133. 


Hence, for P = 0-975 we have approximately 


and closer 


^ , , 9977 - 9750 

G7 = 0-1 4- 

9977 - 9133 


(0-1) = 0-127, 


approximations can be obtained if desired. The corresponding point- on the 



Values of p 

Fig. 19.2. 


0-127 is p = 0-35. Calculations on t hese lines 


give us the 


lower confidence line to = 
values of w such that 

P {Po <pi} = (x exactly, 
whereas the former approach gave values such that 

p {po < <Pi} =a approxiniatoly, 

> a in any case. 

Discontmuous variates usually give rise to this sort of aritlunetii-al nuis-uiei- 
approximation in practice is sufficiently good exeent fnr- v n < - 

cur.es in Kg. gi.e the more pJeo1“1i.:^r“herno "T 
approximate step-curves. ^ coujs{\ insKh- the ni(ir<* 

noticing that the points on the curves of Eitr in o, * , , 

by selecting an ordinate ro and then findffig the corresnomh^a constructed 

diagram is, so to speak, constructed horizontaJh/ i ^ ^scissai' m,, and d’ho 

verticaUy. that is to say. with observed abscissa « we re^ S ’ 

assert that po <Pi. It is instrucfiVA ^ values p„ and /p and 

be justified without reference to Bayes’ postulate change of viinvpoint can 


hull tin* 
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Consider Fig. 19.3, which shows a pair of confidence lines for the binomial. Let m* 
be a given value of zo and let the horizontal through za' meet the confidence lines in points 
with abscissae tUo and zu^. Then we know that in repeated samples from a population 
with parameter zd' a proportion oc will give observed values of p lying between zuo and zui ; 
for the curves were constructed so that this should be so. 

Now since the horizontal at zu' lies entirely within the confidence belt for zuo 
(and does so for any zzf'), it follows that the assertion that zzf' lies in the belt is correct if. 



and only if, p lies between zuo and zd„ that is in a proportion a of the cases. This, being 
true for any ro', is true for all ro', irrespective of the relative frequency of occurrence of the 
ra’s under estimate. Consequently our assertion that w lies in the confidence belt is correct 
in a proportion oc of the cases ; and, in particular, for any observed p we may assert that 
ZD hes within the ordinates determined on the two curves by the vertical through p. 


Confidence Intervals for Large Samples 

19.10. In our usual notation, the logarithm of the likelihood function gives 

n 

logL = ^logf {xp 0 ), . . . . 

jam'X 

d log L ^ d log / 

= 


(19 .5 
(19.6) 


V^e may regard — — as a random variable, and in particular write' 



so that 


(19.7) 
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d log L 

Write ^ 

^ VinA)' 


(19.8) 


Then, for large samples, will be distributed normally in the limit with unit variance, in 

virtue of the Central Limit Theorem, under very general conditions. It will also have 
zero mean, since 




= (19.9) 

Hence, from the distribution of ^ we may easily determine confidence limits for 6 in large 

samples if ^ is a monotonic function of d, so that inequalities in one may be transformed to 
inequalities in the other. 

It is sufficient (but not necessary) for the existence of the normal limit to ip that ^ 
exists for all a;, except perhaps at isolated points, that the range is independent of 6 and 
that the Central Limit Theorem applies (e.g. if the third moment of exists). We 

also assume, as usual, that differentiation under the integral sign, as in (19.9), is legitimate. 


Example 19.3 

Consider again the problem of Example 19.1. We have, with p for 0, 


var 


/ (X, p) = exp {- 1 [X - pY- 



= 1 . 


Hence 



— {x ~ p) y ft 


IS normally distributed with unit variance for large n. (We know, of course, that this 

IS true for small w as well in this particular case.) The confidence limits may then be set 
as m Example 19.1. 


Example 19.4 

Consider the Poisson distribution whose general term is 

e~^ 
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We have 


9 log/ _ X 

I 




1 

I’ 


X I 


r («) - % 


Hence 


yi 


J 


{x — 1). 
I 


V(n/A) 

For example, with a = 0'95, corresponding to a normal deviate i 1*96, we have, for the 
central confidence limits, 


n 


{X ^l) Jl=± 1 - 96 , 


giving, on solution for I, 




2x H- 


3-84' 


n 


X==x -I- + 

n 




3-84x 3-69 

+ 


n 


n‘ 


the ambiguity in the square root giving upper and lower limits respectively. 
To order this is equivalent to 


X + 1 


■96 /5. 
V n 


from which the upper and lower limits are seen to be equidistant from the mean a;, as we 
should expect. 


Shortest Sets of Confidence Intervals 

19.11. It has been seen in Example 19.1 that in some circumstances at least there 
exist more than one set of confidence intervals, and it is now necessary to consider whether 
any particular set can be regarded as better than the others in any useful sense. The 
problem is analogous to that of estimators, where we found that in general there are many 
different estimators for a parametei', but that we could sometimes find one (such as that 
with minimum variance) which was superior to the rest. 

In Example 19.1 the problem presented itself in rather a specialised form. We found 
that for the intervals based on the mean x there were infinitely many sets of intervals 
according to the way in which we selected clq and ai (subject to the condition that 
ao + ai = 1 -fa). Among these the central intervals are obviously the shortest, for a 
given range will include the greatest area of the normal curve if it is centred at the mean 
of the curve. We might reasonably say that the central intervals are the best among 
those determined by x. 

But it does not follow that they are the shortest of all possible intervals, or even that 
such a shortest set exists. It might also happen that for two sets of intervals Ci and Ca 
those of Cl are shorter than those of Cg in part of the range of aj’s and longer in other parts. 
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19.12. We will therefore consider sets of intervals which are shortest on the average. 
That is to say, if ® 

d — ^0 

we require that 


d dF = minimum. 


. ( 19 . 10 ) 


where the integral is taken over all x’s and is therefore equivalent to 

. . . dx^ ( 19 . 11 ) 

We now prove a theorem which is very similar to the result that maximum-likelihood 
estimators m the limit have minimum variance, namely that in a certain class of intervals 
he method of 19.10 gives those which are shortest on the average. 

Let h (x, 6) be a function which has a zero mean value and is such that the sum of 
a number of similar functions obeys the Central Limit Theorem. Then 


^ A (xj, 6} 

'/=i 

V (n var h) 


. ( 19 . 12 ) 


distributed m the limit with zero mean and unit variance, w of equation 
(19.8) IS a member of the class C- We prove that the average rate of change of w with 
respect to 0, for each fixed 6, is greater than that of any C except in the trivial case 

h = k 

dd 


Writing g (x, 6) = we have 

OU 


^ ^ / y ^ . y ^ 0 

dO ^{n var g^) \ 90 2 vlir gr ^ 


Hence 


50 \/{nYdiX]i) 


d var h 


var h 


( 19 . 13 ) 


( 19 . 14 ) 


^/{n var g) 


Now E {g) = 0 and 


Thus 


e( ^‘ 

V 


2 var g dO 




Similarly, 


= - 

■\/{n var g) 

= ~ V var g) = A^, say. 


var h 


(S) 


A 2 , say. 


( 19 . 15 ) 


( 19 . 16 ) 
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Since E (h) — 0 we have 


E 


dh 

W 


— cov {h, g). . 


(19.17) 


Hence 


Af — Al = n var g 
n 


n 


cov 2 {h, g) 


var h 


var h 

{var h var g — cov^ {h, g) }. 


(19.18) 


Thus, unless h is a multiple of g, we have 

A\ > Al 

which was to be proved. 

Now if is a value such that 

— ^ — C dx — -|a, 

V(2^)J, 

the upper and lower confidence points for central intervals are ± and the values of 9 
are the solutions of 

= -1- w 

y {n var g) 

say to and ti. Similarly those for any function h are given by 

Z h {x^ 0) 

■\/ln var h) 

say Wo and Wj. The equations for confidence points are equivalent to 

l/J (0 = d= Va 

C (»■) = -Jz Va 

or, eflbctively, in large samples, by 

V’ (^o) 4" (^ ^o) ~ ^ 


± Wc 


(19.19) 


(19.20) 


/ rit \ 

C (<5o) + ( W — <5o) ( -^ ) = ± 


where 0,, is a fixed value of 0. When t Oo and u ^ Oo we have ip (Oo) — 'Q (0o)- Hence 

'-'••(s).."'-"-’!*).. 

Now we have just shown that, on the average, ^ Hence, on the average, 

t — Oo < n — Oo, 

and the confidence limits t are closer together than those of any member of the class u for 
any fixed value of 0. 

19 . 13 . A comparison of the result we have just proved and the properties of maxi- 
mum likelihood estimators in the limit will show the close relation between confidence 
intervals and the theory of estimation developed in Chapter 17. In 17.27 we showe , 
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by considering the quantity u — — — , that any estimator t which is in the limit 
distributed normally about the true value do caimot have a variance less than 


l/n£J 


diog/y. 
de j ’ 


and that the latter quantity, in the limit, is the variance of the maximum likeHhood esti- 
mator. It attains the minimal value when u is constant over samples for which t is constant. 

The theorem of 19.12 shows that on the average the intervals determined by the 
distribution of u are shorter than those based on any other function with a zero mean value 
(obeying the usual conditions as to continuity, etc.). Since the maximum likelihood 
estirnator has minimum variance, we should expect that confidence intervals based on its 
distribution would be shorter than others ; and this we now see to be so. For if u is constant 
over samples of constant t, the distribution of u in all samples is equivalent to that of t. 


Confidence Intervals and Sujficient Estimators 

Pursuing this line of thought, we are led to inquire whether sufficient esti- 
mators provide confidence intervals for finite samples and whether they have any minimal 
properties of the kind we have just established for large samples. 

It is easy to see that sufficient estimators do in fact provide confidence intervals. 
If t is sufficient for 6, the likelihood function may be put in the form 

L = fx (t, d) {x^ . . . xj . . . . (19.22) 

and the distribution of t and 0 is 

dF=f^{t,Q)dt (19.23) 

Given a we can then find and t^ such that F (to, 0) = 1 - ao and F {t^, 0) = and solve 
for 6 in terms of to and ao or t^ and ai, as the case may be. This process will provide the 
inequalities of the type we require, a proposition which we shall prove formally below 


Example 19.5 

In Example 17.8 we saw that 


is sufficient for 0 in the distribution 


e = 


X 

p 


dF dx 

r{p)oi^ ’ 


0 c, X ' 00, p 

where p is regarded as known. The distribution of 0 is in fact 


np d 


exp , — 

= V ® 

\9/ f(np)~ 

np d 


dO. 


The distribution function of m = — — is the incomplete F-function 

F-ni 


r {np) 


= I 


m 




, 7ip — 1 . 
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We then find the values of m corresponding to oco and from the tables, and have 

P (m < mo) = oco 
P {m'^ m-x) = oci, 

whence 

p {^ <0 + -1 

1^ mo mi J 

= a. 

19.15. The position in regard to minimal properties of confidence intervals based 
on sufficient estimators remains somewhat obscure, but one would expect some such proper- 

9 lo Ij 

ties to hold even for finite n. Since u — — ~ — is constant for constant t when t is sufficient, 

ou 

the variance of u will be a function of the variance of t. This, however, is not necessarily 
enough to establish the fact that the corresponding confidence intervals are shortest on the 
average. It is imaginable that the confidence intervals derived from its distribution might 
be longer on the average than those of some other system. This seems rather unlikely, 
at least for the ordinary distributions of statistical theory, but apparently no proof has 
been given. 


19.16, Neyman (1937&) has proposed to apply the phrase “shortest confidence 
intervals ” to sets of intervals defined in quite a different way. As it does not appear 
that such intervals are necessarily the shortest in the sense of possessing the least length, 
even on the average, we shall attempt to avoid confusion by calling them “ most selective.” 

Consider a set of intervals Cq, typified by d, obeying the condition that 

P {^ocO I 0} = a, (19.24) 

where we write 5o c 0 — that is, contains ” 0 — ^for the more usual to <,0 <ti {tx — to = (3o). 

Let Cx be some other set typified by such that 

P {^,c0 I 0} = a (19.25) 

Either set is a permissible set of intervals, as the probability is a in both cases that the 
range 5 contains 0. 

If now for every Cx we have, for any value 0' other than the true value, 

P {(5„c0' |G} <P {(5iC0' |0}, . . . .(19.26) 

Co is said to be most selective. 

19.17. The ideas underlying this definition will be clearer from a reading of Chapters 
26 and 27 dealing with the Neyman-Pearson theory of inference. We anticipate them here 
to the extent of remarking that the object of most selective intervals is to cover the true 
value with assigned probability a, but to cover other values as little as possible. We may 
say of both Cq and Cx that the assertion 5 c 0 is true in proportion a of the cases. What 
marks out Co for choice as the most selective set is that it covers false values less frequently 
than the remaining sets. 

The difference between this approach and the one leading to shortest intervals is that 
the latter is concerned only with the narrowness of the confidence interval, whereas the 
former gives weight to the frequency with which alternative values of 0 are covered. One 
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concentrates on locating 0 with the smallest margin of error ; the other takes into account 
the desirability of excluding so far as possible false values of 6 from the interval, so that 
mistakes of taking the wrong value are minimised. 


19.18. Neyman himself has shown that most selective sets do not usually exist (for 
instance, if the distribution is continuous) and has proposed two alternative systems : — 

(а) most selective one-sided systems (Neyman’s “ shortest one-sided ” sets) which 

obey (19.26) only for values of 0' — 0 which are always positive or always negative ; 

(б) selective unbiassed systems (Neyman’s “ short unbiassed ” sets) which obey 

(19.25) but, in place of (19.26), the further relation 

p {Sce\ 6 } = <x.>p {dce\e'} ( 19 . 27 ) 

In essence these sets amount to a translation into terms of confidence intervals of 
certain ideas in the theory of tests of significance, and we may defer consideration of them 
until Chapters 26 and 27 are reached. 

Generalisation to the Case of Several Parameters 

19.19. We now proceed to generalise the foregoing theory to the case of several 
parameters. Although, to simplify the exposition, we shall deal in detail only with a single 
variate, the theory is quite general. We begin by extending our notation and introducing 
a geometrical terminology which may be regarded as an elaboration of the diagrams of 
Figs. 19.1 and 19.2. 

Suppose we have a frequency function of known form depending on Z unknown para- 
meters, 01 . . . Oi, and denoted by f (x, Oj^ . . . 0^). We may require to estimate either 
01 only or several of the 0’s simultaneously. In the first place we consider only the estima- 
tion of a single parameter. To determine confidence limits we require to find two functions 
Uo and Ui, dependent on the sample values but not on the 0’s, such that 

P { Wo < 01 < Wi I 01 . . • 0/} = a, . . . . (19.28) 

where a is the confidence coefficient chosen in advance. 

With a sample of n values, iCi . . . x^, we can associate a point in an w-dimensional 
Euclidean space, and the frequency-distribution will determine a density function for 
each such point. The quantities Uq and u^, being functions of the x’s, are determined in 
this space, and for any given a will lie on two hypersurfaces (the natural extension of the 
confidence lines of Fig. 19.1). Between them will lie a Confidence Zone or Region of 
Acceptance. 

In general we also have to consider a range of values of 0 which are a 'priori possible. 
There will thus be an Z-dimensional space of 0’s subjoined to the w-space, the total region 
of variation having (Z -f n) dimensions ; but if we are considering the estimation of 0i, 
this reduces to an {n + l)-space, the other (Z — 1) parameters not appearing as variables. 

We shall call the sample-space W and denote a point whose co-ordinates are Xx . . . x.^ 
by E. We may then write Uq {E), Ux (E) to show that the confidence functions depend 
on E. The interval Ux (E) — Ua {E) we denote by d (E) or d, and as above we write <3 c 0i 
to denote Uq < 0i <Ux. The region of acceptance or confidence zone we denote by A, 
and may write E s d or E s A to indicate that the sample-point lies in the interval d or 
the region A. 
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19.20. In Fig. 19.4 we have shown two axes Xi and and a third axis corresponding 
to the variation of di. The sample-space W is thus two-dimensional. For any given 
di, say 01 , the space IF is a hyperplane (or part of it), one such being shown. 



Take any given pair of values (xi, x^) and draw through the point so dehned a line 
parallel to the 0i-axis, such as PQ in the figure, cutting the hyperplane at It. The two 
values of Uq and will give two limits to 0i corresponding to two points on this line, say 
U, V. Consider now the lines PQ as vary. In some cases U, V will lie on opposite 

sides of B, and 0^ lies inside the interval UV. In other cases (as for instance in U'V shown 
in the figure) the contrary is true. The totality of points in the former category deter- 
mines the region of acceptance A, shaded in the figure. If for any point in A we assert 
S c 0[, we shall be right ; if we assert it for points outside A we shall be wrong. 


19.21. Evidently, if the sample-point E falls in the region A, the corresponding 
6 1 lies in the confidence interval and conversely. It follows that the probability of any 
fixed 0i lying in the confidence interval is the probability that E lies in A (0^) , or in 
symbols — 


P{<5 c 0 ; I 01 . . . 0/} = P {'iio <01 


Ui\()i ... d,} 

= P {E B A (01) I 01 . . . 0,}. 


(19.29) 


From this it follows that if the confidence functions are determined so that 


P{Ufi <01 < -Ml 1 01 • • . 0/} = a 

we shall have, for all 0i, 

£ A (0i) 1 01 ... 0/} = a (19.30) 

It follows also that for no 0i can the region A be empty, for if it were the probabUity in 
(19.30) would be zero. 
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19.22. If the functions Uo and are single-valued and determined for all B, then 
any sample-point will fall into at least one region of acceptance. For on the line PQ cor- 
responding to the given E we take an E between U and V , and this will define a value of 

di, say O'l, such that E sA (0^). , ' „ 

More importantly, if a sample-point falls in the regions A {di) and A {d-^) correspond- 
ing to two values of 0^, and d\, it will fall in the region A where 07 is any value 
between d[ and 07 For we have 

Uq ^01 ^ Uxi "^0 ^ 01 

and hence Wo < 0i < 0i < 

if 0j' is the greater, and hence 

Ufj ^ 0^ ^01 ^ 01 

or Mo < 07 ^ ^ 1 * 

Further, if a sample-point falls in any of the regions A (00 for the range of 0-values 
d'l < di < 07 it must also fall within A (00 A (0i). 

19.23. The conditions referred to in the two previous sections are necessary. We 
now prove that they are sufficient, that is to say : if for each value of 0i there is defined 
in the sample-space W a region A such that 

(1) P{E e A (0i) I 0i} = a, whatever the value of the 0’s ; 

(2) For any E there is at least one 0i, say d[, such that E s A (00 ; 

(3) If E e A (00 &nd E e A (00, then E s A (07) for any 07 between 0[ and O'i ; 

(4) If E e A (0i) for any 0i satisfying 0j < 0i < 07 E e A (0i) and E s A {Oi) ; 
then Mo and Ui, viz, confidence limits for 0, are given by taking the lower and upper bounds 
of values of 0^ for which a fixed sample-point falls within A (00- They are determinate 
and single-valued for all E, Uq <Mi, and P{m.o <.0i <Mil0i}=a for all 0i. 

The lower and upper bounds exist in virtue of condition (2), and the lower is not greater 
than the upper. We have then merely to show that P{mo < 0i <Mi | 0i) — a, and for 
this it is sufficient, in virtue of condition (1), to show that 

P{mo <01 <M,|0,}==P{-i^e^(0i)10i)- • • .(19.31) 

We already know that if E s A (00 then Uo < 0i < Mj ; and our result will be established 
if we demonstrate the converse. 

Suppose it is not true that when < 0i < Ui, E s A (00- Let E' be a point outside 
A (00 for which Mq < 0i < Mi. Then must either Mq = 0i or Ui = 0i or both ; for other- 
wise Ua and Ui being the bounds of the values of 0i for which E lies in A (00, there would 
exist values 0^ and 07 such that P a A (00 and P £ A (00 and 

“Uq 02 <1 01 02 < Ml, 

SO that, from condition (3), E s A (0i) which is contrary to assumption. 

Thus Uq — 01 or Ui = 01 or both. If both, then P must fall in A (0i), for Mo and Mi 
are the bounds of 0- values for which this is so, and if they coincide their common value 
must be so. Finally, if Mq = 0i < Mi (and similarly if Mq < 0i = mO we see that for 
Mo < 01 < Ml, P must fall in A (00 from condition (3), and hence, from condition (4), P 
must fall in A (00 and A (00 where 0i = Mq and 02 = Mi. Hence it falls in A (00- 

19.24. The foregoing theorem gives us a formal solution of the problem of finding 
confidence intervals in the general case, but it does not provide a method of finding the 
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intervals in particular instances. In practice we have three lines of approach : (1) to use 
sufficient estimators, (2) to adopt the process known as “ studentisation/’ and (3) to 
“ guess ” a set of intervals in the hght of general knowledge and experience and to verify 
that they do or do not satisfy the required conditions. 

19.25. Consider the use of sufficient estimators in the general case. If t-y is sufficient 
for 6 1 we have 

L — Lx {tx, &x) Li {Xx • • . 02 • • • Oi). . . . ( 19 . 32 ) 

The locus tx — constant determines a series of hypersurfaces in the sample-space W. If 
we regard these hypersurfaces as determining regions in W , then tx < h, say, determines 
a fixed region K. The probability that E falls in K is then clearly dependent only on 
tx and Ox- By appropriate choice of h we can determine K so that 

P{E eK\dx} = o^, 

and hence set up regions of acceptance based on values of tx- We can do so, moreover,, 
in an infinity of ways, according to the values selected for Kq and ai. 


Studentisation 

19.26. In Example 19.1 we considered a simplified problem of estimating the mean 
in samples from a normal population with unit variance. Suppose now that we require 
to determine confidence limits for the mean /i in samples from 




Tile approach of Example 19.1 would lead us to the conclusion that, for confidence coefficient 
0-9545 and central intervals, 

P y X — — '-iis ii X -j- — — I n, O' 0*9545. 

[ yn yn j 

But we cannot now say that the confidence limits are x A 2a/ s/n because a is unknown. 


Consider then the distribution of 


X — fl 


, where S“ is the sample variance. This 


is known to be the Student ” form 


<W - 


(1 -f yy 


(Cf. Example 10.6, vol. I 

, p. 239.) Given 

a, we can now find z^ and Zx, such that 



C 1 -- a 

and hence 

J JQ « 

/ Co " 


P { — Zx 

Z ^0 j 

which is equivalent to 

P {x — SZq < 

^ X SZ 1 j* = tx. 


Hence we may say that /,/. lies in the range x — sz^ to x -H- szx with confidence coefficient 
a, the range now being independent of either [x or a. In fact, owing to the symmetry of 
“ Student’s ” distribution, Zq = Zx, but this is an accidental circumstance peculiar to the 
present case. 
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19.27. The possibility of finding confidence intervals in this case arose from our 
being able to find a statistic 2 , depending only on the parameter under estimate, whose 
distribution did not contain a. A scale parameter can often be eliminated in this way, 
although the resulting distributions are not always easy to handle. If, for instance, we 
have a statistic t which is of degree p in the variables, then t/s^ is of degree zero, and its 
distribution must be independent of the scale parameter. When a statistic is reduced 
to independence of the scale in this way it is said to be “ studentised,” after “ Student ” 
(W. S. Gosset), who was the first to perceive the significance of the process. 

19.28. It is interesting to consider the relation between the studentised mean- 
statistic and confidence zones based on sufficient estimators in the normal case. The 
distribution of means and variances in normal samples is 

and X, s are jointly sufficient for fx, a. In the sample space W the regions of constant x 
are hyperplanes and those of constant s are hyperspheres. If we fix x and s the sample- 
point E lies on a hypersphere of {n — 2) dimensions. Choose an area on this hypersphere 
of content a. Then the acceptance region will be obtained by combining all such areas 
for all X and s. 

One such region is seen to be the “ slice ” of the sample-space obtained by rotating 
the hyperplane passing through the origin and the point (1, 1 ... 1) through an angle 
jra (not 27 ra because a half-turn of the plane covers the whole space). 

The situation is illustrated for w- = 2 in Fig. 19.5. 



For any given fx' the axis of rotation meets the hyperplane [x = (x' in the point 
Xy = Xz = fx', and the hypercones — = constant in the W space become the plane 
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sx 


areas between two straight lines (shaded in the figure). These may be regarded as regions 
of acceptance, and one set is that obtained by rotating a plane about the line = x^, = /t 

TtOL 

through an angle so as to cut off in any plane fj, = /i' an angle — on each side of 

““ ^ x^ ' ^ * 

The boundary planes are given by 

Xx — — (^2 — /*) 

3/^ = {x^ fx) tan 2 



where ^ = :7 t( 1 — a) ; 


or, after a little reduction, 


fX 


Xx ^2 


+ 


Xi — X, 


cot 


/5 


fX. = 


Xx + Xz 


X, 


X., 


9 



f .1 then lies in the region of acceptance if 

Xi 


Xx “f" X^ 

""2 " ' 


cot ^ 

2 2 


^ Xi Xz . 1 3^1 

M 2 + 2 


cot 


These are in fact the limits given by “ Student’s ” distribution for n = 2, 


variance then becomes 


X, 


Xi 


2 


and 


2 ' 

since the sample 


so that 


1 r"" dz _ 1 / ^ 


tan"i 



1 — a /5 

2 271 





19.29. Tables or diagrams of the confidence intervals for selected values of a have 
been given for the following parameters ; — 

(a) the proportion w in the binomial (Clopper and Pearson, 1934) ; 

{})) the parameter of the Poisson distribution (Garwood, 1936 ; Ricker, 1937) ; 

(c) the correlation coefficient in normal samples (David, 1938a) ; 

\d) the median in samples from any population (K. R. Nair, 19406). 

In addition, results for the mean of a normal population may be obtained from Student s 
integral as shown above. Those for the variance of a normal population may be obtained 
from the P-function or the equivalent ;(j2-integral. For simultaneous estimation of mean 
and variance there are difficulties, as we proceed to show. 

19.30. It might have been expected that the foregoing theory could be generalised 
to give simultaneous pairs of confidence intervals for two unknown parameters when 
intervals for each separately cannot be found. Very little progress in this direction has, 
however, been made. The difficulty may be illustrated by reference to the joint distri- 

A.S. — VOL. II. ^ 
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biition of mean and variance (19.33). From tlie independent distributions of a; — [X and 

- we can, given a, /8, find U, and Wo> snob that 
a 

p I _ < ^LuJi < ^ a 

where the fs and u’s depend only on sample values and a, may be chosen at will. The 
inequalities are equivalent to 

X — ato < ^ < :r + cttx ..... (19.34) 



and these give 


But can we then infer that 



Ux Uo 


X 


tfs 







(19.35) 

(19.36> 


P ix — s ^ ^ X — s\ = y, . . . ( 19 . 37 ) 

{ Uo Ux i 

where y is a constant dependent on a and jd ? We cannot. This equation is, in fact,, 
not generally true. The fact can be verified by considering the distribution of the statistic 
X — ks and showing that its distribution function F {u) is not independent of p, and a. 


19.31 . In the next chapter we shall see that a similar problem, giving rise to Behrens’ 
test, provides a crucial point of difference between the theory of confidence intervals and 
that of fiducial intervals. All we need say here is that from the point of view of the former 
the problem of simultaneous confidence intervals for several parameters remains unsolved, 
except of course in the degenerate case when we can find independent intervals for each 
parameter separately. 


19.32. In conclusion we indicate without proof a few results which have recently 
been obtained. 

(1) Wilks and Daly (19396) have generalised the theorem of 19.12 to the case of several 
parameters. Under fairly general conditions the confidence regions which are shortest 
on the average are given by 




d log L d log L 1 
dd~ J 


< 




where (a^j) is the inverse matrix to that whose general element is 

P, / aiog/ aiog/ \ 

V dOi ddj / 

and 2a is such that P ( 2 ^ < 2 I) = “j "tli® probability being calculated from the 2 ^"distri- 
bution with v — 1. This is clearly related to the result of 17.46 giving the hmiting forms 
of variances and covariances of maximum likelihood estimators. 

(2) Wald (1942) has considered the problem of large samples from the point of view 
of most selective sets (“ shortest ” in Neyman’s sense) and has proved results somewhat 
similar to those of Wilks and Daly. 
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(3) Wald and Wolfowitz (19396, 1941c) and Kolmogoroff (1941) have considered the 
problem of setting confidence limits to the terminals of an unknown frequency-distribution. 

NOTES AND REFERENCES 

When the theory of confidence intervals and that of fiducial intervals were first devel- 
oped many statisticians regarded them as equivalent. In papers written between 1930 
and 1938 “ confidence limits ” and “ fiducial limits ” are often used in the same sense ; 
and even where a distinction of approach was drawn the results given by the two methods 
appeared identical. The case of Behrens’ test, however, provided an illustration where 
the methods lead to different results — see the following chapter. 

The fiducial approach is due to R. A. Fisher, references being given at the end of 
Chapter 20. The approach of the present chapter has been developed mainly by Neyman 
(see particularly 19376), E. S. Pearson, Wilks (19386, c, 1939u and — with Daly — 19396), 
Wald (1939u, 1942), Welch (1939a), and Bartlett (1936a, 1939a). A number of the references 
to Chapters 26 and 27 are also relevant. 

Confidence intervals can be obtained for the median and other quantiles which are 
independent of the form of distribution. See Thompson (1936), Savur (1937a) and K. R. 
Nair (19406), and compare Exercise 19.5. 


EXERCISES 

19.1. Show that for the rectangular population 

= y , 0 <x <0 

and confidence coefficient a, confidence limits for 6 are t and t/ip where t is the sample range 
and ip is given by 

^ { w. — {n — 1) ^) = 1 — a. 

(Wilks, 1938c.) 


19.2. Show that, for the distribution of the previous exercise, confidence limits 
for samples of two, x’l and x^, are 

Xi -f- .Tg Xi + X.2 

1 ^'(l _ a)’ 1 - -v/(l “ a)‘ 

(Neyman, 19376.) 


19.3. Show also, in the case of the previous exercises, that if L is the larger of a 
sample of two, confidence limits are 

T 

V(1 - a)- 

(Neyman, 19376.) 

Show further that if M is the largest of samples of four, confidence limits are 

M ^ 

(l-Ot)l' 

(For an experimental verification, see Frankel and Kullback, 1940.) 
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19.4. Show that, for the distribution 

dF = d dx, 0 <x < oo 


central confidence limits for large samples with a = 0-95 are given by 


d = 


1 ± 


1-96 

■\/n 


X 


(Wilks, 1938c.) 


19.5. If a frequency function is continuous, the probability that the /cth of a sample 
of n (arranged in ascending order of magnitude) lies in the range dx is 

1 _ jpy 


p/c- 

B{k,n- k + 1) 

where F is the distribution function. Deduce that 




P {x;^ < M < } = 1—2 Jq-s (^ — ^ -f- L ^)5 

where M is the median, and hence show how to determine confidence intervals for M from 
the incomplete P-function. 

Generalise the result for quantiles. Show that the results do not hold for discon- 
tinuous distributions. 


(Thompson, 1936.) 
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FIDUCIAL INFERENCE 

20 . 1 . We now proceed to examine a type of inference known as fiducial. As in 
other methods of estimation, given a distribution of known form depending on an unknown 
parameter 6, we shall attempt to find limits between which 0 lies in some sense associated 
with the theory of probability. To that extent our present approach is similar to the 
use of estimators with their associated sampling error and to the use of confidence intervals ; 
but it is distinct from the latter both in essential ideas and in some of the results to which 
it leads. 


20.2. Consider samples of n from a normal population of unknown mean ju and 
unit variance. The sample-mean x is sufficient for [x and its distribution is 

~ . . . ( 20 . 1 ) 

In speaking of a distribution in this sense we regard fx as fixed and consider the totality 
of values of x derived by random sampling from the population with given /x. The pro- 
portion of samples falling in a range dx is then given by (20.1), which holds for each 
value of (X. 

We now change our viewpoint and consider a different kind of distribution based on 
(20.1). If we are given a value of x from a sample, what are the values of /x which could 
have given rise to this value to any fixed level of probability ? If the deviation x — jx 
written as h, we know that the probability of the inequality 



X — -< ^ 

being true is a, where a depends on h and is in fact 




nx^ 


dx. 


. ( 20 . 2 ) 


(20.3) 


Looking at this the other way round, we may say that given any a we can find h, a function 
of a only, such that 

fi, > X — h . . . . . . (20.4) 

is true with probability a. For any fixed x this gives us a distribution of ju. Consider 
in fact the equation 

fx=x -h (20.5) 

If II has a distribution function F (fx), we have, since (20.4) is true with probabihty a. 


1 — V. = F (//) = 1 — 1 


n 

2jt 


whence 


/ (i“) 


' n 
271 


exp 


exp 


nh‘‘ 


nx‘^' 


dx. 


2 


dh. 


But in virtue of (20.5), dfx 


f (/tt) d/x 


dh and h — fx — x. Thus 
n {fx — x)^' 




dfx. 


( 20 . 6 ) 


This is called the fiducial distribution of fx. 
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20.3. It so happens that in this example the non-differential parts of (20.6) and 
(20.1) are the same. This is not essential although it is not infrequent. The crucial 
point of difference, however, lies in the appearance of the differential element dfi, relating 
to the variation of /a,, and the disappearance of dx relating to the variation of We have 
derived a distribution of the parameter /a from that of the random variable x by trans- 
ferring our attention in (20.4) from x to /a and regarding the inequahty as still satisfied 
with probability a. 


20.4. We note in the first place that this distribution is not necessarily existent. 
When we come to make an inference in any particular case we do not assume' that ^ is 
itself distributed in the fiducial form in the sense that it has been chosen at random from 
an existent population of /a’s. of that form. Such a prior distribution, which would be 
required for the application of Bayes’ theorem, is not admissible from the point of view 
of the frequency theory of probability. The fiducial distribution is a hypothetical one of 
conceivable values of //. We attach probabilities to these values, or rather to values in the 
range d/A, by identifying them with the probabilities (based on frequency) which are derived 
from the distribution of a sufficient estimator of /a. For this reason the fiducial distribution 
is not a frequency-distribution in the ordinary sense ; but it is a probability distribution 
in its own special sense. We use it to make statements of the kind : among the values 
of /A which are possible, only those in a certain range give rise to the observed x with 
probability a, and hence we will locate y in that range. 


20.5. In our present example the argument would proceed as follows. From equation 
(20.6) and the use of the normal integral, the probability that y — x does not exceed a 
certain h is ascertainable as a function of h ; for instance, 



0-9775. 


If we regard a probability as high as this as acceptable, we may say that y <x + 'Ij ■sjn. 

This result is equivalent to that given by the theory of confidence intervals, for if 
wo assert y < x -f- 2/ y'n we shall be right in the long run in 97-75 per cent, of the cases. This 
identity of result is found in most elementary cases where a single parameter is concerned, 
but is to be regarded as accidental. In the theory of confidence intervals it is fundamental 
(a) that the assertion as to the parameter lying in a given range should be true in an assigned 
proportion a of the cases, and (6) that no assumption need be made as to the prior dis- 
tribution of the parameter, either in the frequency sense or in the fiducial sense. In fiducial 
theory it is not necessary that {a) should be true, but the fiducial distribution is 
a fundamental part of the inference. 


20.6. There is a further distinction between the two theories. In that of confidence 
intervals it is possible to have two entirely different sets for the same parameter, and in 
fact part of that theory is devoted to finding “ best ” sets among the possible ones. In 
fiducial theory such a state of affairs must not be possible, for different limits would imply 
different fiducial distributions for the same parameter on the same evidence. This is avoided 
by confining fiducial distributions to those based on sufficient estimators, or more generally 
on a set of estimators which together avoid all loss of information. Since such estimators 
alone contain all the information relevant to the problem of estimation they alone can 
give the fiducial distributions accurately. It follows, of course, that where no sufficient 
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estimator — or estimator with complete set of aiicillaiy estimators — can. be found, the 
fiducial method is inapplicable. - 


20.7. Generally, let F [6, t) be the distribution function of a sufficient estimator t 
for a parameter 0. Then for the frequency distribution of t we have 

BF {t, d) 


dF 


Bt 


dt. 


(20.7) 


F {t, 6) is the probability that a random value of the estimator does not exceed a given 
value t. In accordance with the fiducial principle, this may be equated to the probability 
that for fixed t the value of 6 will exceed t, so that for the fiducial distribution of 0 we have 

dF = - F (t, 6) }dd 


BF {t, 6) 
BO 


dd. 


( 20 . 8 ) 


This shows the general relation between the frequency-distribution of the estimator and 
the fiducial distribution of the parameter. 


Example 20.1 

If p is known, the estimator 0 = - is sufficient for 0 in samples from 


xP - 1 


0 < a; 


the distribution of 0 being, in fact. 


dF 


-(?) 


Tip ^ 


exp 


-1 exp 


0 

\ 

r (np) 


the corresponding fiducial distribution of 0 is 

, exp 

.np-1 ^ 


00 


r (np) 

(Cf. Example 17.8.) We may write this in the form 

np§'^ 

It is then clear that, since 

_^BF BF m_ 

W ~~ ^Bt BO’ 




•<{¥> 


dF 




npO \ 

^ ^ TivO 

r{np) ^6'^' 


(20.9) 


( 20 . 10 ) 


which may also be put in the form (20.9), provided that we interpret the differential element 
now as relating to 0 and not to It wiU be noticed that we have replaced d^ by b 
not merely by d0. 

From the fiducial distribution (20.10) we can find the probability that 0 lies in a certain 
range dependent on the observed 3 and the chosen probabihty a. This is in fact the same 
range that we should obtain by applying confidence intervals to (20.9). Once again the 
results of the two methods are the same. . 
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Fiducial Inference based on “ Student's ” Distribution 

20.8. Consider now the estimation of the mean jx in samples from a normal popula- 
tion with unknown variance o^. The treatment of 20.2 is no longer of use, for it would 
result in a fiducial distribution of fx containing the unknown o. We therefore “ studentise ” 
the problem by considering the distribution of 


^ _ {x — fx) ^n 
^ ? 


. ( 20 . 11 ) 


which is independent of a, being in fact 

dF oc 


dt 


( 




( 20 . 12 ) 


where v = n — 1. Here s'^ is the unbiassed estimate of the sample variance 

1 


n 


The distribution of t may be written 


dF oc 


- x)2. 


- {x — /^)® % 1 


(20.13) 


The fiducial distribution is then 


r (la — x)^ w 

\ * + j 


(20.14) 


In the usual way we can find two constants, for any given a, such that, .from (20.14), 


P {(Xo < fx K }Xi} = a, . . . . . (20.15) 

the probability being based on (20.14) and therefore to be understood in the fiducial sense. 
Had we worked with (20.12) or (20.13) we should have found tg such that 


P{ - t, t < to] = a, 

which is equivalent to 

. . .(20.16) 

This may be interpreted in the sense of confidence intervals, i.e. that in asserting the 
inequality in (20.16) we should be right in a proportion a of the cases in the long 
run. (20.16) does not rest on this statement as to frequency ,■ though thei» limits to which 
it leads are the same and the statement happens to be true. 


20.9. The case we have just discussed raises a new point. Is it still true that 
the fiducial distribution is unique, and is it consistent with the distributions of /x and a 
separately ? The distribution is based only on the sufficient estimators x and s' (which 
are jointly but not separately sufficient for fx and cr) and we should expect this to be so. 
But the matter requires investigation, for we are here using a fiducial distribution based on 
two estimators. 
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The simultaneous distribution of x and s' is 
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oc - exp 
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exp 


{n - l)s' 2 i ^ 5 ' 

2cr2 f 


. . (20.17) 


If we were considering fiducial limits for [i with Jcnovm a we should use the distribution 


dF oc - exp I — (x — j- dx. 


If we were considering fiducial limits for <y with known fx we should not use the other factor 
in (20.17), 


dF oc 
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'\n-% 


exp 


{n — 1 ) ds' 
2^^2 f — ’ 


(20.18) 


for in such circumstances s' is not sufficient for a, the appropriate estimator being 

~ F {x — fiY. The question is, what form of fiducial distribution must hold for or in order 

that the “ Student ” form (20.14) should hold for fx when a is unknown ? 

Suppose the fiducial distribution is f {s', a) da. We have then for the joint fiducial 
distribution of /x and a, 

dF oc —exp / — {x — A^)^ 1 djxf {s' , cr) da. 
a ( 2ff^ J 

We have therefore to solve 


f r 1 

(Jo 5 


n 


„• (//, -xY\ f {s', a) da\ dfx = 


k dfx 


{jx — xY 


(20.19) 


s '2 {n - 1 ) 


where k is some constant. Tutting {/x — xY = oc, 


n 


we have then to solve 




k 


1 + 


no. 

{n — 1 ) 5 '^ 


n 

2 


Regarding a as the complex quantity it we see that ^f^s', ^ ~ ~ ^ is the frequency 
function whose characteristic function is ^ j ^ gives 


n 


from which we find 




{n — 1 ) 5 '^’ 


2(t' 


or, on evaluation of the constant, 

2 r(^i — 1) 


/ {s', a) da = 


r 


/n - 1 

\ 2 


2cr2 




This, then, is the fiducial distribution which a must obey. We should have arrived at 
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the same result had we taken (20.18) and transformed it to the fiducial form, as if it related 
to s' and a only and the former were sufficient for the latter. 

It appears, then, that in this case at least the fiducial method gives consistent results 
when two parameters are involved. The general problem of many parameters presents 
difficulties and has not been elucidated to any great extent. 

The Logic of Fiducial Inference 

20.10. The notion of fiducial probability was introduced by Fisher (1930) for the 
case of a single parameter. Regarding the estimate t as fixed, Fisher considers the dis- 
tribution of values of 6 for which t can be regarded as a representative estimate — ^representa- 
tive, that is to say, in the sense that it could have arisen by random sampling from the 
population specified by 6. As pointed out above, this does not mean that we are regarding 
the true value of 0 as a member of an existing population. Rather, we are considering the 
possible values of 0 and attaching to each value a measure of our confidence in it, based 
on the probability that it could have given rise to the observed t. 

If I interpret him correctly, Fisher would regard a fiducial distribution as a frequency- 
distribution. This imphes that 6 is regarded as a random variable. It appears to me, 
however, that it is not a random variable in the ordinary sense of the frequency theory 
of probabihty, in which values of 6 either are or can be generated by an actual sampling 
process. We can never test whether the fiducial distribution holds in the frequency sense 
by drawing a number of values and comparing observation with theory. Nor, in calcu- 
lating fiducial limits of the type 6 = t h {a), do we imply that the proportion of cases 
for which 0 Kt h is true will be a in the long run. 

20.11. The reader has a choice of several attitudes towards the foundations of the 
fiducial argument : (a) he can accept the argument as involving a new postulate of infer- 
ence ; (6) he can regard it as sanctioned by the approach of the previous section ; or (c) he 
can, so far as estimates based on a single parameter are concerned, console himself with 
the thought that the results of the process are the same as those given by the theory of 
confidence intervals. 

20.12. Although Fisher is careful to emphasise the distinction between his own 
approach and that based oh Bayes’ postulate, it is interesting to note that the theory of 
inverse probability as modified by Jeffreys gives results which are in many cases identical 
with those of fiducial inference. 

In the example of 20.2, for instance, suppose that the prior distribution of ^ is/ {/j,) dfx. 
Then for any given x the posterior probability of is 

dF = f{{x)dfi J ^e-K.-^ . . . (20.21) 

If the total probability is unity we have 

j_ /(/“) I = 1- • ^ - (20.22) 

Clearly / (ju) = 1 is a solution, and we may use characteristic functions to show that it is 
the only solution. In fact we have from (20.22), writing it for nx — 

/ W exp ( - ^“) e^p ( - |!). 
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^ V. and hence 


/(/^) exp 




exp 


nfji^ 


. 2 j V 2 r 

or/(/^) = L 

We have, then, for the posterior probability distribution of [x, 


dF 


j 


n 

271 


exp 


n 


{fx — xY I djx, 


(20.23) 


which is the same as the fiducial distribution. The requirement that / {[x) = 1 is equivalent 
to a prior distribution of fx, dF = d^x, which is the form given by Bayes’ postulate for a 
parameter which can extend to infinity in either direction. 

Example 20.2 

In Example 20.1, a similar argument leads to a prior distribution of 0, 

dO 


dF oc 


0 * 


This is the form given by Jeffreys’ modification of Bayes postulate when a parameter 

can extend to infinity in only one direction. 

It does not appear, however, that fiducial and inverse probability always give the 
same results. Consider the distribution of the correlation coefficient in normal samples 
(14.14)— 

n-l fcOS“^(— pr) 

oc (1 - (1 - I 


dr. 


(20.24) 


The argument of the type we have just employed would require a prior distribution of p- 


dF oc 


dp 


(I -P^Y’ 

and the i-esulting posterior distribution (which is equivalent to that obtained by inter- 
changing r and p in (20.24)) is not the same as we should get by using equation (20.8). 

Behrens^ Test 

20.13. Suppose we have two samples of ni and n» members from normal populations 
with possibly unequal variances. The fiducial distributions of fXi and fXs, are of the 
“ Student ” form (20.14). Writing 

“h '^1 

JU 2 = Xz H“ '^2 

we have , /oo 9 k\ 

■ iXi — — x^ s'l Ui — 52 Wa (20.25) 

If now 

— Xa — (^1 — P 2 ) . f20.26) 

£ depends only on the known quantities x and s' and the difference of means px 
From the fiducial distributions of pi and we can find that of e, and hence make fiducial 
statements of the type 

Xx — Xi — Sq + 4^) < V(5i^ + 52^)- • (20.27) 
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20.14. The distribution of s is not of a simple form. Putting tan ^ we see that 

S = ^ COS y) ~ sin ... (20.28> 

so that e is distributed fiducially as the weighted difference of two variables, each of which 
is distributed as “ Student’s ”1 We have then to find the distribution of 


£ = ti cos tp — tz sin -ip 

where the joint distribution of ti and is given by 



. (20.29> 


The distribution has been studied by Sukhatme (19386) and in more detail by Fisher 
(1941a). Tables are given for various values of Ut, and the ratio (or the equiva- 

lent angle y)) showing the values of e corresponding to given probability levels. Some of 
the tables are included in the second (1943) edition of Fisher and Yates’ Statistical Tables for 
Agricultural, Biological arid Medical Research. 


20.15. The joint distribution of s'l^ and is 

dF oc exp I - ^ {n^ - 1) - |(%, - 1) ds[^ ds,^ 

L err a; ^ “ 
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ip — and u 

o 


Putting 

we find, on a little reduction, 
dF oc 


e' 2 


i s (Wi 1 ) -1 — [- {n^ — 1 ) 


5 ; 2 ' 


P 




dp 


'p (wi - 1) ^ ^ 


1 1 i { n , + n ,~ 2 ) 


ui { ni + n ,- i ) g-M 


. (20.30)' 


at O'! 


Thus u is distributed (independently of p) in the Type III form. Further,, 
(^1 (^2 ~ distributed normally about zero mean with variance erf -f a?,. 

Hence, if ^ = 0^ we find that the quotient 


{ (^1 — Pi) — (x z — pz)} ^ {ni + Uz 

(‘T? + cr|) 


2 ) 


(Wi — 1 ) 42 ^ ^^ 2 - 


£2 (1 + p) {Ui + _ 2) 


(To 


{tIz — 1) -f (?li — 1) ^ 1 (1 -j- 0) 


(20.31) 


is distributed as jf^ with ?Zi -|- ?^2 — 2 degrees of freedom. (Cf. Example 10.17, vol. I, 
p. 248, for the distribution of a normal variate divided by a Type III variate.) 

Now if we knew 6 we could find fiducial (or confidence) limits to e, and hence to — Uz, 
in the usual way, for the distribution of £ would then be independent of unknown constants 
and ascertainable from “ Student’s ” integral. Since, however, 6 is not known, we require 
m turn the fiducial distribution of this quantity. Since 


i log 


fuj, 42 42 N 


of 


^2 


is distributed in Fisher’s form (cf. Example 10 . 18 , vol. I, p. 249 ), the required fiducial 
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form for 0 can Ibe obtained from that of z, which incidentally is equivalent to that of p 
in (20.30). If we express (20.31) as the joint fiducial distribution of s and 0 and integrate 
out for 0, we shall be left with an equivalent form to that derived from (20.29). 

20.16. It also follows from the above that the inequality (20.27) is not satisfied in 
proportion a of the cases independently of 0, so that the limits to ■— [jl^ are not confidence 
limits, although they are fiducial limits. It will, in fact, be evident enough from (20.31) 
that if we determine and tx so that the integral of “ Student’s ” form between those 
limits is a, then the corresponding limits for e, say £o and ex, are dependent on the variance 
ratio 0 = o\/a\. This is fairly evident on general grounds, and the point has been put 
beyond doubt by both Fisher (19376) and Neyman (1941a), who have worked out particular 
cases of difference. 

The fiducial distribution of e (which is an extension by Fisher of a result given by 
Behrens as early as 1929) thus provides a crucial point of difference between the theory of 
fiducial inference and that of confidence intervals. 


20.17. In conclusion, we will indicate the viewpoint of Jeffreys towards the type of 
problem dealt with by “ Student’s ” distribution for limits to the mean and Behrens’ 
distribution for limits to the difference of two means. 

If H denotes the general data, we have for the ‘‘Student” distribution — 

I- dt 

P {dt I <r. H} = ... (20.32) 

r +7) 


The expression on the left states the probability that t will lie in a given range dt on the 
assumption that H is true, the parent mean being [.i and the parent variance o'®. Since 
/i and cr do not appear on the right they are irrelevant and may be suppressed, and hence 


P{dt\H) 


k dt 


f2\4(v+l) 


Suppose now that we assume that 

P{dt\x, s, H] =f{t) dt. 


. (20.33) 


. (20.34) 


Then, as before, x and ,s' may be suppressed and we have 

P [dt\H} =f{t)dt, . 
and hence, by comparison with (20.33), 

P [dt 1 , s, H } = 


k dt 


1 + - 


(20.35) 


(20.36) 


We can then proceed to find limits to t, given x and s, in the usual way. Jeffreys empha- 
sises, however, that this depends on a new postulate expressed by (20.34) which, though 
natural, is not trivial. It amounts to an assumption that if we are comparing different 
distributions, samples from which give different ;r’s and s’s, the scale of the distribution 
of (j, must be taken proportional to s and its mean displaced by the difference of sample 
means. 
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20.18. In a similar' way it will be found that to arrive at the Behrens distribution 
it is necessary to postulate that 

P { I Xi, Xi, s s^, — fi (h )/2 (^ 2 ) dtz • ■ • (20.37) 

Jeffreys’ derivation of the Behrens’ form from Bayes’ theorem would be as follows : — 
The prior probability of dfjix dfz^ dcx da^ | H is 

P{dfXx dfi^ dffx da,\ H} oz _ 

0*1 0*2 

The likehhood (denoting the data by D) is 


P {D I fix, O'!, 0 - 2 , H} oc 


- + ^ 1 } 



Hence, by Bayes’ theorem 


P { djLti djuz d(Tx da 2 ] DH} 




exp - 


{ 


Tlf 

~ n \ { (i“2 ~ ^2)^ + ^ 1 } dfii dff., da I da 2. 

^^2 . J 

Integrating out the values of ax and a^, we find for the posterior distribution of /(i and /fa 
a form which is easily reducible to (20.29). 


20.19. To sum up : so far as concerns problems of estimation the Behrens test is 
accurate both in fiducial theory and in the theory of probability propounded by Jeffreys. 
But the test does not hold in the theory of confidence intervals. In fact the latter fails 
to provide an exact solution to the problem, though we shall see below (21.28) that approxi- 
mations are possible. Fisher has criticised confidence intervals on the ground that they 
do not give an answer to what is admittedly an important question ; but it appears possible 
to maintain consistently that some questions may not have an answer. 


NOTES AND REFERENCES 

For the general theory of fiducial inference see Fisher (1930a, 1933, 1935a, 6, 1936c, 
1941a). The difficulties of reconciling Behrens’ test with confidence-interval theory were 
noticed by Bartlett (1936a) and led to some controversy, for which see Fisher (19375, 
1939a, 1940c), Bartlett (1939a), Yates (1939/), and Neyman (1941a). For Jeffreys’ views 
see his papers of 19375, 1938c, 1939tZ and 1940. 

For the practical application of Behrens’ distribution see Sukhatme (19385) and Fisher 
(1941a). Behrens himself stated his results explicitly only for the case of equality of sample 
number, nx — Uz, the extension being given by Fisher (19355). 
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is distributed in “ Student’s” form with v = n-l. Hence show that fiducial limits 
for X are 



where h is chosen so that the integral of “ Student’s ” form between - (i and is an 
assigned probability a. 

(Fisher, 19356. This gives an estimate of the next value when n values have 
already been chosen, and extends the idea of fiducial limits from parameters 
to variates dependent on them.) 

20.2. Show similarly that if a sample of tti values gives mean and estimated variance 
sj*, the fiducial distribution of mean x> and estimated variance a,, ^ in a second sample of is 

(?ii — I) Sj* -f (Wj — 1) {Xi — jj)* 

Hence, allowing h to tend to infinity, derive the simultaneous fiducial distribution of 
u and a. 

(Fisher, 19356.) 





CHAPTER 21 

SOME COMMON TESTS OF SIGNIFICANCE 


Tests of Significance 

21 . 1 . We now pass from the problem of estimation to that of significance. The 
two are closely allied and in practical problems they both arise together as a rule ; but 
it is useful to preserve a distinction between them. In estimation we try to find, with 
greater or less accuracy, the value of some parameter in a population which is known to 
be (or assumed to be) dependent on that parameter. In tests of significance we are given 
some value of a parameter beforehand and wish to decide whether it is acceptable in the 
light of the evidence. This is the distinction in its simplest terms, but of course the 
associated problems become increasingly complex when several parameters are concerned. 

21 . 2 . From one point of view the problem of significance is logically anterior to that 
of estimation. Suppose we have records of the yields of two varieties of wheat grown 
Tinder similar conditions, and are interested in a comparison of the average yields of the 
two. Our first question is whether the observed mean yields indicate any difference between 
the varieties — a matter of significance. Not until significant differences are established 
does our interest turn to the magnitude of the difference — a matter of estimation. Again, 
if we have a set of records of only one variety, our primary problem may be to decide 
whether they are consonant with the hypothesis of normality in the parent population, 
whatever its mean and variance ; and only when this point has been settled affirmatively 
do we proceed to estimate those parameters. 

Nevertheless, we have lost very little by taking the problem of estimation first. In 
some practical problems the question of significance is already decided, and in many others 
we use estimates of parameters to test the significance of the latter, in which case estimation 
and significance become different aspects of the same statistical fact. 

2 1 .3 . We shall consider the general theory of testing statistical hypotheses in Cliapters 
26 and 27. That theory is, however, rather abstract, and we anticipate it to some extent 
in this chapter by giving an account of the principal tests in current use, without for the 
moment going too deeply into their rationale. It will be seen later that there are sometimes 
many significance tests which can be applied to the same problem, and that it is possible 
to lay down criteria for deciding which, if any, are the “ best ”. This aspect of the subject 
will not concern us for the present. We shall not discuss whether the tests we describe 
are the best possible (though some of them, in fact, are so) but shall merely present them 
as useful and convenient, albeit perhaps not unique, solutions of our problems. 

21 . 4 . Developments in statistical theory in the last two decades have resulted in 
a great many tests of significance appropriate to special problems. It is not easy to classify 
them and quite impossible to deal extensively with them all. We shall consider them 
under the following heads : — 

(a) Tests of the significance of a specified parameter value. — The typical hypothesis 
here is that a parameter in a population of known form has a specified value (usually 
zero). We wish to know whether the evidence provided by the sample supports the 
hypothesis or not. 
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(6) Tests of goodness of fit. — The hypothesis is that the population is of a certain 
kind which is either fully specified beforehand or can be “ estimated ” with the help 
of the data. We wish to know whether the sample values fit this population in the 
sense that they could have arisen from it by random sampling to any acceptable degree 
of probability. This hypothesis is more general than that of (a) since it concerns 
the whole distribution function and not merely one of its parameters. 

(c) Tests of homogeneity. — The hypothesis here concerns two or more populations, 
each providing a contribution to the sample. We wish to test whether the populations 
have certain parameters in common, or in the extreme case, whether they are identical. 
This case can be regarded as an elaboration of (a) where several parameters are simul- 
taneously tested. In the particular case when only two populations are concerned 
we may sometimes reduce it directly to type (a) by considering differences ; e.g. if 
we are making a comparison of parent means the hypothesis might be that the single 
difference of means is zero. 

In addition we shall also consider two sets of tests of rather a different kind ; — 
{d) Tests of order of occurrence. — The hypothesis here is that the sample members 
occurred in random order, and we wish to ascertain whether the observed order indicates 
any systematic effects, as, for instance, whether there are any cyclical effects in time- 
series. The test here is of the sampling process rather than of parameters of the 
parent population. 

(e) Conditional tests. — The hypothesis may be any one of the above types, but 
we restrict the inference to a sub -population for which certain qualities are deter- 
mined by the observed sample values. For instance, we may use the distribution 
of the sample variance s‘^ for which the mean x is equal to the observed value. In 
.short the variation of sample values is conditioned. Type (d) may from some points 
of view be regarded as a particular case of this type. 

It is not intended to convey that the above five categories are mutually exclusive. 
A test of type (a) may, for example, be conditional or non-conditional. The classification 
will, however, provide some sort of articulation for a rather long chapter and serve to 
exx^lain our sequence of treatment. 

S tandard Errors 

21.5. For large samples the test of significance of a parameter can usually be carried 

out by standard errors. We find an estimator t of the parameter 0 and consider whether 
the gi ven value of 0 falls in the range t^ i var t, where t^ is the value of t for the observed 

sanq)lc and k is a constant chosen at will according to a probability a. If so we may accept 
the value of 0, at least so far as this test is concerned ; if not, we reject it. 

If the variance of t does not depend on unknown quantities such as other parameters, 
this type of inference is justifiable as an api)]ication of the theory of confidence intervals. 
In accepting 0 when it falls in the range t^ ± /c V^ar we shall be right in p)roportion a of 
the cases in the long run. As a refinement we may, of course, use non-central intervals 
and locate 0 in an asymmetrical range — /c„-\/var t to ti H- Jcis/vax t. The test of signifi- 
cance is equivalent to the estimation of the true value of d ; and it will clearly be better 
if the range of estimation is narrower, for then we reject more wrong values of 6. 

21.6. If the variance of the estimator t depends on unknown parameters 02 . . . 0^ 
we can usually substitute estimates of those parameters obtained from the sample itself, 

A.s. — VOL. II. b: 
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provided that the sample is large. For example, we have for normal samples 

The sample standard deviation a will differ froip a by a quantity of order l/V^-. so that 
to that order 

The approximation breaks down for small samples, and more aecnrate methods are required. 

217 The use of standard errors in testing significance has been illustrated in pievious 
chapter's and we need not enlarge on the process further. We may, however, remark 

Li That if the distribution of an estimator t tends to normality for large samples 
irms^eotle of the parent form (as, for instance, is the case with the mean and other moments 
unde^ very genera! conditions), it is not necessary that the hypothesis should specify the 
pW Si In short, our test of significance is independent of the parent, a valuable 

^ lb) That we have justified the logic of reasoning involving the use of standard errois 
by the theory of confidence intervals (and a similar justification can be given ni tiims 
of fiducial intervals if we use an efficient estimator for which the loss of information tends 
to seL rltive to the total information in large samples). This appears to be the most 
s^LLcterv bLis for the use of standard errors. I'he usual intuitive basis ailvanced 
;™ily) in introductory textbooks is not easy to dcfmid. For 

there obvious reason W we should base our inference the « /-f- 

values of namely on the improbability of something which has not occuricd (st c. 21 _55 
below). Our present approach shows that in fact the use of standard errors can be j ustiht d 
logically without invoking a new principle of inference. 

SigfiificcLUCB of the Mean iu N 07^7uc(l> SoMplss 

21.8. Suppose we have a sample from a parent population which is known to bo 
normal,’ but of whose mean and variance we are ignorant. We wish to test the sigmticmice 
of a given value of the mean, that is to say, we wish to consider whether the observations 
could, to any acceptable probability, have been derived from a population witli mean //„, 

whatever the variance may be. 

We calculate the statistic 

« = ■ ■ ■ • ■ • 
all the quantities in which are given. We know that the distribution ol t is 

^ \ 2 ) dt 


dF 


. H- 1 

V ‘2 “ 


( 21 . 2 ) 


and hence can find the probability that our calculated value of t is attained or exceeded. 
If this is small we reject /lo ; if not, we accept it. What values are regarded as small 
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for this purpose is a matter of convention, but the most frequently used values are 0-05, 
0-01 and 0-001. 

From the work of the previous two chapters it will be evident that this type of infer- 
ence is the confidence- or fiducial-interval approach in a slightly different form. Given 
a we can find --- ti and such that the integral of dF in (21.2) between those limits is oc. 


ij s is 

This gives us confidence or fiducial limits to ^ of the type x — ~ and x + ; and if 

pio lies in this range we accept it. In particular cases we may have ta — ti, in which cases 
the intervals are central and our probability a is the chance of t being attained or exceeded 
in absolute value ; or fo = + which case oc is the chance that — G will be attained 

or exceeded, and no lower limit to //o is imposed. 


Example 21.1 

The weights of fifteen bags of sugar taken from a filling machine are found to be, in 
ounces, 16-1, 15-8, 15-8, 15-9, 16-1, 16-2, 16-0, 15-9, 16-0, 15-7, 15-7, 15-8, 16-0, 16-0, 15-8. 
Each bag should be 16 ounces, but some deviation is inevitable. One of the manufac- 
turer’s problems, of course, is to keep this deviation to a minimum, but that is not the 
point we now consider. Our question is : if the machine is supposed to be giving weights 
of 16 ounces on the average, does the sample suggest that it is failing in its purpose ? 

The hypothesis is that the parent mean is 16 ounces and the deviations from this 
mean are, in order of magnitude, — 0-3 (twice), 0-2 (four times), — 0-1 (twice), 0-0 
(four times), 0-1 (twice), 0-2 (once). The sample mean is thus — 0-()8 and to that extent 
the average of the sample is slightly underweight. Is this a significant effect ! 

It will be found that = 0-0216 so that 


t = 


0-08 

V0021f) 


V14 


2-04, 


V = 14. 


From Appendix Table 3 (vol. I, p. 440) we find that for v = 14 the probability of a deviation 
greater in absolute magnitude than 2-04 is about 2 (1 — 0-969) = 0-062. This is small, 
but whether we regard it as significant or not depends on the probabilities we are prepared 
to consider as defining significance. The usual values are 0-05 and 0-01, and with such 
criteria we should not take the observed value as significant, though it arouses suspicions. 

We have here used central intervals, which are usual for the i-test of significance 
of the mean ; but it is easy to imagine circumstances in this particular case for which 
non-central intervals might be required. For instance, if the machine was at fault and 
had a true mean filling weight of more than 16 ounces the manufacturer would be giving 
sugar away for nothing. This might be serious, but probably not so serious as if the 
machine was erring in the other direction, which would render him liable to prosecution 
for selling short weight. Suppose he assessed the latter risk as nine times as serious as 
the former and was working to a probability level of 0-05. Then he would require 
the probability of a negative value of t greater than the significance value to he 
0-955 (= 1 — 6-045) but could allow that of a positive value less than the significance value 
to be 0-995 (= 1 -- 0-005). From Appendix Table 3 we see that this corresponds to 
deviations of approximately — 1-8 and -f 3-0. Our observed value is outside this range 
and is thus significant. Small as the average shortage is, it would be prudent to overhaul 
the machine and to make sure that it is giving fair weight on the average. 

We may note further that if the sample had occurred in the order 

15-7, 15-7, 15-8, 15-8, 15-8, 15-8, 15-9, 15-9, 16-0, 16-0, 16-0, 16-0, 16-1, 16-1, 16-2 
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we should almost certainly have concluded that there was something wrong with the 
machine, for the weights are steadily rising. The i-test would give the same result for 
this sample as for the first, since it does not depend on the order of occurrence of the mem- 
bers. Where, therefore, the appearance of individual sample members is ordered in time, 
the 15-test alone may fail to reveal significant effects due to the changing of the population 
between drawings. Our data are still such as could have arisen at a single drawing of 
fifteen members from a population with mean equal to 16 ounces ; but the data throw 
doubt on the point whether we are really asking the right question in assuming that they 
all came from the same population. We consider the point again below ( 21 . 41 ). 

Before leaving this example, we may note another possible test, cruder than the i-test 
but sometimes useful. If the parent mean were really zero, positive and negative devia- 
tions should occur equally frequently in the long run. In our present case there are 8 
negative deviations, 3 positive ones and 4 zero. If we allot, conventionally, two of the 
last to each group we have 10 negative and 5 positive deviations. The expected number 
is 7^, so that the deviation is 2|-, with a standard error of -\/(15 X ^ X ^) = 1-94. The 
observed deviation is very little in excess of this, so we conclude that the preponderance 
of negative signs in the sample is not significant of a negative mean in the population. 
More exactly, we find that the occurrence of 5 or fewer positive deviations is the sum of 
the first six terms in the binomial -f -!■) namely 0-151, leading to the same conclusion. 
The test is a very rough one since it pays no attention to the magnitude of the deviations ; 
but it has the advantage of applying to any symmetrical form of parent population for 
finite samples. 


Properties of the t-Distribution 

21 . 9 . “ Student’s ” distribution has numerous applications in the testing of signifi- 

cance apart from the one just considered, and we proceed to study its properties. 

The form (21.2) is a Pearson Type VII and may be transformed to the Beta-distribution 
(Type I) by the substitution ^ = 1^^!+^^ The distribution function of t may thus 
be obtained direct from the B-function. For instance, we have 



whence 


(21.3) 
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The values of the argument for which I has the values 0-50, 0-25, OTO, 0-05, 0-025, 0 01, 
0-005 and v = 1 (1) 30, 40, 60, 120, oo, have been tabled to five significant figures by 0. M. 
Thompson and others (1941a) and can hence be used to derive the values of t corresponding 
to those probabihty levels. 


21.10. Except for special purposes, however, the use of the R-function is unnecessary, 
since the distribution function of t itself and tables based thereon are available. 

We have 

/ 

- log ( 1 + 


V 


*2 + 1^ 
V 2^2 


+ ^ + 




and hence 

V + 


1“ 1 1 /i i/2 4- (- + (j + 1) (— 

— log (^1 -t- - j = -i* + - . . + 2jU+l)vi 

Further, from the expansion for log F (1 -1- a;) we find 


(21.4) 


log 




r 


V 


2 


^ ' 


4r 24v® 20r^ 


(21.5) 


Now as V tends to infinity, t tends to the normal form with zero mean and umt variance. 
Writing 


'\/(27v) 




we find for the logarithm of the ordinate of (21.2), in descending powers of v, 
log ,/ -1- A ((- -- 2(2 - 1) - (2C - SC) I- 2^5 (3** - 


-J— — 5^8) + (5^12 -- 6^1® — 3) 

401''* ' 601'’^ 


. ( 21 . 6 ) 


Taking the exponential and integrating from t to co, we find 


1 ‘ + <isr= <*** ' - 5(^ - 3) ( -1- ((“ ut‘ 


H lii* + Oi* — 3(® — 15) ( + — 375('2 + 2226('“ — 2141* * 


939fi _ 213^^ - 915t^ + 945) t + . . ■ • • • 

This is the expression, due to Fisher, which was used by “ Student ” hiinself in calculating 
the distribution function of t given in Appendix Table 3, Vol. 1. For values of v 18 the 
first four terms of (21.7) give F to an accuracy of about 0 000,005. 

21.11. Tables are also available in the “ inverse ” form, that is to say, giving values 
of t corresponding to specified values of v and F . Such tables may be derive y in er- 
polation from the Student ” tables or by the normalisation method of 6.32. In worK 
involving tests of significance this type of table is perhaps the most convenient, since it 
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enables one to decide without calculation (other than interpolation for values of the 
argument not covered by the tables) whether particular values are signiticant foi' (dioscui 
probability a. The complement of the probability a is spoken of as a level of 
and expressed either as a number between 0 and 1 or as a percentage. Simihi,rly t,lH> 
corresponding values of t are called significance points, and we may speak, for (‘xntnpky 
of the 5 per cent, value of t, meaning that value for which F is 0-95. 

Fisher and Yates (1938a) give the values of ^ for r == 1 (1) 30, 40, 00, 120 and and 
^ ~ (^’1) 0-05, 0-02, 0-01, 0-001. These tables, it should be reineinbered, 

give the significance points corresponding to twice 1 — F, that is to say the vahu's of t 
such that the proportion of the distribution outside the range d: t iw 1 F. 


21.12. The number v is usually called the number of degrees of freedom of f. d'his 

is an expression which occurs in other connections, and a few words of exi)Ia.nation are 
desirable. 

It has been seen that the variance of a normal sample is distributed like the surn of 
{n — 1) squares of independent variates (compare Example 10.5, vol. I, p. 23S) and geiu'r- 
ally, that if there are k linear relations connecting the original variates, the stun ol' s<juar(‘s 
of the originals is distributed as the sum of n ~ Jc independent normal varia.tes of (‘(|ual 
variance. Each linear relation reduces the freedom of the variation, as it wer(\ l>y unit v. 
It is thus natural to speak of the number of degrees of freedom, v, of a funcition such as 
X^, meaning thereby that it is distributed as the sum of squares (d‘ r ind(‘p«md(Mit 
normal variates with equal variance. The expression only has this natural nu-aniug wlu'u 
normal variation is concerned. 

It so happens that the quantity t depends on a parameter r which is (U)nveni(Mit for 
tabulating its distribution function and is also the number of degrees of fiv(‘doni of th(‘ 
statistic 52 entering into the denominator of t v may thus, by an extension of t he terni 
be called the number of degrees of freedom of t, but this usage does not imply that, / is 
distributed as the sum of squares of normal variates. 


Distribution of t in Non-normal Case 

21.13. Part of the price we have to pay for the precision of the i{-test in small .samph's 
IS the assumption of normality in the parent. If the population is not normal wo maN- .still 
of course consider the distribution of “ Student’., ” ratio, which will remain indopmulout 
of the scale parameter; but complications appear because the iiaramotcrs which express 

morf Th T f ““ appear in the sampliuc distrihution. Fur h,T- 

more, the distributions of .r and a are no longer independent 

in th^form” (Uhlbh), 

and the uonulaHon h T variance in samples from a population are iMdepeud,.ut 

and the population has finite cumulants, it must be normal. 

From 11.13 we have 

a (2F) = r > 0. 


TV 


If mean and variance are independent, « (2ir) = o and hence a, , „ = 0 for r > 0 Thus 
relatir^ the Type . yTr'o “ that we have not' had to use 

assume ^inlrenroVf': oVeT^e :G:mp,e™® ^ "-'y 
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21.14. In the notation of Chapter 11 we write 

X-\/v lci\/v 


-y/Ka ( 1 -f- 


/Co — /fa 




and expand in terms of powers of 


Ji 


tea — /c® 


The method follows that of 11.23 and we 


Ka 


find for the moments of t about the parent mean, assumed zero, to order v ^ 

fi'i = 


4-1 P3 + t|- (2^3 -- 

y v [167^ 


= 1 _!_ ? (1 + + 4 X, + 6Ai X,) 

" V ' V“ 

= _ 4- I "Aa -r 4 ( 210 A 3 - 66 A 0 H- 105^3 A 4 + 210 Ai) 

y^v [2 Ibv 

fi' = 3 -f- 4^ — A 4 14A|) + — (102 — 3 OA 4 -j- 24 A 5 

' V r“ 


-j- I2OA3 ‘^Ab — I32A3 A5 — 6A4 + I68A3 A4 -f I2OA3) 


( 21 . 8 ) 


where 


X = 

'■ K.y 


, .20 2 A 4 

u., = 1 - 1 - --+ -- — — 

V V- r " 


//., = 3 


If the parent form is symmetrical, cumulants of odd order vanish and we have, to 
order r and first order terms in the A’s — 

/h = fh = 0 

_ r - 1 _ 2 A 4 , 

• • ~ ,, ATi 72 ' j- (21.9) 

18 102 2A„ 'idA^ , ^ 

,,2 .j ,2 ... __ __ ^,2 

Except for the term in X.i these are the values of the moments of t in Student s dis- 
tribution, and it follows that for symmetrical parents which are not excessively lepto- 
or platykurtic we should not expect the i-test to be invalidated. If the parent is skew 
the situation may be different. 

21.15. The general skew case has been considered by E. S. Pearson and Adyanthaya 
(1928, 1929) from the experimental viewpoint and by Bartlett (193.5a.) and Geary (19366) 
from the theoretical viewpoint. Various writers have derived exact distributions of t 
in non-normal samples, but the sample numbers are, as a rule, trivially small and the 
results of little practical value. Geary considers the population expressed by the first 
two terms of the Gram-Charlier series — 


1 


? (3a- - .a^) 




dx 


(21.10) 


(IF = ; - ^ 1 

V 2 je \ h 

and assumes that powers of /fa above the first may be neglected. He finds (cf. Exeicise 
21.1) that the frequency function of t in this population is equal to the Student form 
plus a corrective factor 

{3?/- t^{2v + 1 )} ^ 

1 + 

V 


/fa 


6r y { 2jt {v -\- 1 ) } 


( 21 . 11 ) 
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The integral of this factor from — oo to — t is 


6 


7(rdnr.)(-r"('- 


?Li_‘ i.y 


( 21 . 12 ) 


giving the correction to he applied. (Geary gives a table for some representative values.) 
This, of course, depends on Ks, but even where exact knowledge of the skewness is not 
available we may sometimes safeguard against error by considering the correction for 
plausible values of /cg. 


Other Uses of the t-distribution 

21.16. The usefulness of “ Student’s ” t derives from the fact that it is independent 
of the scale parameter, and the simplicity of its distribution from the fact that it is the 
ratio of two indeperident variates, the numerator distributed normally and the denominator 
distributed in the Type III form. We shall see below (21.26) that these properties can 
be used to test the difference of two means in normal populations with equal variance, 
and in Chapter 22 we shall encounter a test of regression coefficients which is based on 
the same properties. 

We have also noted that “ Student’s ” form can be used to test the significance of the 
product-moment correlation (14.15) and the Spearman rank correlation p (16.18). These 
facts are, however, in a sense accidental. They do not derive from the expression of the 
parameters concerned as the ratio of a normal to a Type III variate, but from the simpler 
fact that the distributions are of the Type II form (symmetrical with finite range) and 
hence can be transformed to the “ Student ” distribution, which is of Type VII. Sym- 
metrical distributions of finite range can often be represented very approximately by a 
transformation to the “ Student ” form, especially if they tend to normality. 


Test of a Variance in Normal Samples 

21.17. The distribution of the sample variance in normal samples is 


dF 





(n-3) 



0 < s < OO. . (21.13) 


Thus, given for consideration a value of and an observed s‘^, we can find the probability 
that s^/a^ is attained or exceeded and accept or reject cr- in the usual way. The distri- 
bution function of (21.13) may be expressed as an incomplete 7''-function, or more con- 
veniently for statistical purposes in terms of ^^ (= v = n ~ 1. 


Example 21.2 

In Example 21.1 we found = 0-0216, y = 14. Could the data have arisen by chance 
from a population in which the true variance is 0-01 ? 

o 

We have — 32-4, v — 14. From the diagram on p. 446 of vol. I we see 

that the probability of such a value or greater is between 0-01 and 0-001, a very improbable 
result ; and hence we reject — 0-01 as a value of the parent variance. 

Once again this type of inference can be justified by the theory of confidence intervals 
since the probability 

P > 32.4| < 0-01 
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is equivalent to 

p[a^ < 0-01. 

\ 32-4 J 

In asserting that cr^ was less than ns^/32*4 (in our present case 0 - 01 ) we should be wrong 
more than 99 times in 100 on the average. 

There is a point of interest to note here. In Example 21.1 we considered a hypothesis 
as to the mean and in the present example a hypothesis as to the variance Had we 
considered the two together, that is to say the compound hypothesis that = 16 and 
02 = 0 - 01 , we should have been in difficulties in justifying our procedure by reference to 
confidence or fiducial intervals, since we could no longer assert that our conclusions were 
right in an assigned proportion of cases. We have avoided this complication by con- 
sidering separately the hypotheses (a) that p, — 1% whatever the variance, and ( 6 ) that 
02 _ Q.ox whatever the mean. This resource is not as a rule open to us where non-normal 
variation is concerned. 


Tests of Normality 

21.18. In large samples we can group the data into ranges and compare the actual 
frequencies with those to be expected on the hypothesis of parent normality. This com- 
parison over the course of the frequency function is not satisfactory for small samples 
unless the grouping is so broad as to deprive the test of most of its efficacy. An alter- 
native is to compute some statistic of the sample and to examine how far it departs from 
the mean value to be expected on the hypothesis of parent normality. 

Consider, for instance, the statistic 

. (21.14) 


This is independent of the mean (because the A;-stati 8 tics are so) and is also independent 
of the scale parameter because it is studentised ”. In normal samples, therefore, the 
distribution of t is independent of mean and variance and thus depends only on the sample 
number n. We have already given formulae for its mean and variance (Exercise 11.16, 
vol. I, p. 289). In fact, 


p\ (t) = fH it) = 0 


//.o {t) = 


{n 


Cm (^^ — 1) 

2) (?^ + 1) (7^ -h 3) 


. (21.15) 


Since the distribution of t is symmetrical we may, for moderate n, consider it as normally 
distributed with zero mean and variance given by (21.15), and this will provide a test-— 
of a somewhat approximate kind — of normality in the parent from which the sample is 
derived. 


Example 21.3 

In the data of Examples 21.1 and 21.2 we have, for the sample moments about origin 
16, in units of OT 

m\ = — 0-8 
ma = 2*16 
— 0’496 
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whence 


mo = 2-31429 

n — \ 


{n - 1) {n 2) 


w, = 0-61319 


0-174. 


The variance of t, from (21.15), is 0-3188 and its standard error accordingly about 
0-57. The observed deviation from zero is considerably less than this, and we see no reason 
to doubt the hypothesis of normality so far as this test is concerned. 

21.19. Another test of normality has been proposed by Geary (1935a), namely 
the use of the ratio 

mean deviation 


standard deviation 


. (21.16) 


If the parent mean is zero, the parent value of w is 


0-79788. The test has also 


been adapted to the case when the parent mean is not zero, and tables provided for the 
application of the test (Geary and Pearson, 1938). 

Geary’s ratio is directed towards detecting deviations from mesokurtosis in the parent. 
The criterion based on ki/k\, which is a natural extension of that for skewness based on 
kz/k%^, is not very suitable for the purpose, since it has a skew distribution for quite high 
values of n. The distribution of Geary’s ratio tends to normality fairly rapidly 
(cf. Exercise 21.2). 


Tests of Goodness of Fit 

21 . 20 . In Chapter 12 we considered in some detail the use of in testing corre- 
spondence between observation and hypothesis. If the hypothesis specifies the theoretical 
values completely no question of estimation arises, and each cell contributing to could, 
if so desired, be tested separately. From this point of view compounds into a single 
test a number of tests of the kind already considered. 

If the hypothesis does not specify the theoretical values completely, but leaves them 
to be estimated in part from the data, some modification in the ;t;“-test is necessary. We 
can now establish a result which in 12.13 was announced without proof : if the estimators 
employed are maximum likelihood estimators, then for large samples the ;(;‘^-test of signifi- 
cance retains its validity, provided that the number of degrees of freedom is reduced by 
unity for every parameter estimated. 

Suppose the hypothesis leaves unspecified a parameter 0, and let t be its maximum 
likelihood estimator. Then if the theoretical frequencies based on the true value of 0 
are A and those based on t are A', we may write 

= ..... ( 21 . 17 ) 


. (21.18) 
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y/ is distributed as the sum of squares of v normal variates with unit variance. The problem 
is to find the distribution of We have 




1 


> 2 , 




i' 


and for large samples the difference between X and X' will be of order n 
expanding the difference in terms of 56, to order n~'^, 


1 

I 


1 

X' 


X'^ dd 


60 -f- 


2 /^'V 

V so) 


ra 


1 d^x'} {doy 

X'^ dd^j 2 




We then have, 


(21.19) 


Now for large samples the maximisation of the likelihood is equivalent to minimising 
and hence 

^\X''^ ”96 


= 0 , 


and 




X 


(60)^ 


- X 


(56)2 2: 


■ 2 
X' ' 


96, 

96 


d^X' 

962 


(21.20) 


But the sum on the right is the reciprocal of the variance of the maximum likelihood esti- 
mator, and writing St for 56, as is legitimate for large samples, we have 

(dty^ 


X‘ 


% 


var t 


( 21 . 21 ) 


The (juantity on the right is itself the square of a variate which (in the limit) is normal 
and has unit variance. Furthermore, its distribution is independent of that of For 
consider the spherically symmetric density-distribution of the v normal variables whose 
sum of squares composes be the origin and P any point ; then — OP^. Now 

for large samples the variation takes place in the neighbourhood of O. A surface of con- 
stant t through P is approximately plane in the effective range of variation. If OQ is the 
normal to this surface, 

OP2 = + PQ'\ 

corres])onding to 




{sty^ . , 

, + X 

var t 


for t is chosen so as to minimise %' “ = Thus if we take t as a new co-ordinate, together 

with (r — 1 ) others in the surface of constant t, the axis of t is orthogonal to the space of 
constant t, and t will be independent of y’'^. 

Jt follows further that is distributed as the sum of (r — 1) squares of normal 
variates. Thus the usual Type III distribution of holds for r - 1 degrees of freedom ; 
and so for every constant fitted, with a reduction of unity in the number of degrees for 
each constant. We have already exemplified the use of the result in Example 12.4 (Vol. 1, 
p. 30]). 


The CO --didributio7i 

21.21. For small samples the x^-test is difficult to apply, since it depends for its 
validity on the fact that the binomial distribution in individual cells may be represented 
by the normal distribution, and hence requires that cell-frequencies shall not be small. 
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proposed by Cramer (1928) and independently by von 

Put 

= f { F (x) — F (x) dx, .... (21.22) 

__ J —00 

where F {x) m the observed distribution function and F (x) the hypothetical distribution 
Junction. The quantity varies from sample to sample, its mean value being 

E{co^) = -^^^F(x){l-F{x)}dx = ±A„ . . .(21.23) 

where is Gini’s coefficient of mean difference (cf. 2.24). For 

E{co^)={ E{F-FYdx. 

J —00 

For any given x the expectation of {F — F)^ is merely the variance of the proportion F 
and hence is equal to Tj^e result (21.23) foUows at once. 

The CO 2-test consists of comparing the observed with the mean value ; but it is not 

^ ^ express the comparison in terms of probabihty as the sampling distribution 
ot CO® IS not known. 


31 . 22 . The numerical evaluation of the integral (21.22) is tedious in the case of a 
contmuous ihstnbution, and Wold (1938a) suggested a modification. If the variate 
range is divided mto intervals at - oo, ai., ii, . . . a, . . . oo, we define 


= 27 { # (11,) )‘ . 

i 

If the intervals are all of width h, 


* -J" W ]dx + ht, 


(21.24) 


(21.25) 


If may be neglected, the zo^-test is equivalent to the 
intervals ^ ^PP y* f the data are ungrouped, the may be taken at equidistant 


In the particular case when F is normal, we have 


nE 




V(273f) 


5 - 


f 


V{^n) 


e du dv dx. 


(21.26) 


Putting u oc -i- X and v — ^ x, we find, after integration with respect to x, 

2^ JL L { “ i - m dfi- 

A further substitution of 7 = a — /? and d = oi + p gives 

1 rv 


4-^/5! 

2-^j: 

1 


r:j: 


y e 


e dy dd 
“i’'* dy 


or 
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21.23. An interesting modification of the cu^-test has been given by Smirnoff (1936) 
who defines 


CO. 


n 


{F - Fy 

J —00 


^dF. 


. (21.28) 


The difference lies in the differential element which has the effect of rendering 
the distribution of co“ independent of F. It is shown that as n tends to infinity the distri- 
bution function of coJ| tends to the form 


1 ^ 

^ J { 2 k-l)n Vi-^ sin z)’ 


(21.29) 


but this does not look a very promising formula for application in particular cases. 

Cramer (1928) has extended formula (21.27) to the goodness of fit of Gram-Charlier 
series and gives some examples of fitting to observed distributions. 


Difference of Two Means 

21.24. A common case occurring in practice is that of two independent samples of 
and n-i members from two populations which may or may not be different. We wish 
to decide whether the evidence indicates a significant difference between the parent means. 
This situation forms a kind of border-line case between the testing of a prior value of a 
parameter and the homogeneity tests which we shall consider below. It is a test of homo- 
geneity in the sense that we are to discuss the question whether two populations are equal 
in certain respects ; but we do not necessarily assume that they are identical, and in any 
case we can regard the problem as equivalent to the testing of a single parameter (the 
difference of the means) to see whether it is different from zero. 


21.25. For large samples we discussed the question in Example 9.10 (Vol. I, p. 226) 
and gave two tests. If the hypothesis is that the parent populations are identical (a true 
hypotliesis of homogeneity) we may pool the samples to form a single sample and test 
whether either mean differs from the mean of the total. If, however, we wish to test the 
less general hypothesis that the parents have the same mean but not necessarily the same 
variance, we may test the difference of means by the ordinary equation expressing the 
variance of a difference in terms of the separate variances. This is not a homogeneity test 
in the strictest sense of the word, but tests of such a character may conveniently be dis- 
cussed in conjunction with the otlicr type, both for small and for large samples. 


21.26. We now consider the corresponding problem when the samples are small 
and the parent ])opulations are assumed to be normal. In the first place we take the 
case when the two populations have the same variance a'^. 

O' “ O' ^ 

Idle sa,raple means ;ri and x.. are distributed normally with variances and — and 

n-i 


means //.j and 


Consequently 



Xo 


— (/^i — i^a) 
<J 


is distributed normally with variance 


— + — , and hence 

ni 


— ^3 — (/^i — f^z ) 



WjL -f 


or 


(21.30) 
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is distributed normally mtli unit variance about zero mean, 
the sample sums of squares about the mean, the quantity 

1 

{SI + SI) 




Further, if 8l and S'i are 


. (21. :u) 


is distributed as’ with - 2 degrees of freedom, independently of the expression 

(21.30). It follows that 


u — 




ijUl — f^i) 


8 



ni Tij (Wx H- W.2 - 2 ) 
W-l + «'2 


. (21.32) 


is distributed like ‘‘ Student’s ” t with v = — 2 degrees of freedom. This expres- 

sion does not contain the unknown a and hence may be used to test the difference f(y [i.. 
This result is due to Fisher (1926a). 


Example 21.4 

In a class of 20 children, 10 chosen at random were given a ration of orangc-juice 
each day for a certain period and the other 10 a ration of milk. Their gains in weight 
during the period were, in pounds : — 

First group : 4, 2-|, 3-|-, 4, 1-|, 1, 3-|, 3, 2|, 3i 
Second group : l-l, 3|, 2^, 3, 2-|, 2, 2, 2-|-, 1-|, 3 

The mean increase in the first group is 2-9 pounds, and in the second 2-4 pounds. Putting 
aside other explanations, one possible factor accounting for this difference is the difference 
in treatments. But we wish to know in the first place whether this is significant. We 
assume, then, that treatment exerted no differential effect and that the samples came 
from normal populations with the same mean and variance. We find 

= 2-9 X., = 2-4 

r {X, - x,y-^ - 9-4 T (^2 - x.,)^ = 3-9. 


Hence, from (21.32), with p.i — = 0, 


u = 


10 + 10 
0-5 


VI 3-3 


V18 


2 = 18 
100 


20 


1-30. 


From Appendix Table 3 (vol. I, p. 441) we see that such a value would be exceeded in 
absolute value with probability 0-21. The difference of a half-pound between the sample 
means is not significant. 

We note incidentally that the sample variances, 0-940 and 0-390, differ considerably, 
and shall see below how the significance of the difference may be tested. At the present 
stage our conclusion as to the non-significance of the difference of means is to be regarded 
with reserve, for the data themselves suggest that we have over-simplified the problem 
in assuming equal variance in the two populations. 


21.27. Apart from the question of unequal variances, the data of the previous 
example will serve to illustrate a further point of interest. Our hypothesis is that the 
children within each group may be regarded as a sample from a population with the same 
mean. Had we been dealing with a sample of, say, seedlings grown from the seed of a 
single plant, this hypothesis would not have been unreasonable ; but children differ very 
much among themselves in nutritional standard, and so forth. Our hypothesis is again 
liable to over-simplify the problem. 
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When the statistician can direct the sampling himself, this kind of problem can be 
tackled with success by pairing. Suppose we select children in pairs of the same sex, 
each pair resembling each other as closely as possible in all the factors which might influence 
the experiment such as age, weight and nutritional standard. We allot at random one 
member to the first group and one to the second, and so for each pair. The differences 
in weights gained between members of a pair may then be regarded as samples from 
a population with zero mean, even if the pairs differ among themselves, and the set of 
differences tested in the usual way. 


Example 21.5 

Suppose that, in the previous example, the data had related to 10 pairs of children, 
thus : — 



First Group 

Second Group 

Difference, 

No. of Pair. 

wt. in lbs. 

wt. in lbs. 

First - Second. 

1 

4 

4 


2 

2i 

H 

- 1 

3 

3l 

21 

1 

4 

4 

3 

1 

5 

u 

21 

- 1 

« 

1 

2 

- 1 

7 

3.1 

2 

H 

8 

3 

H 

i 

9 

2;V 

11 

1 

10 

31 

3 

X 

*2' 


— “ 



Totals 

29 

24 

5 


For the values in the last column we find 


0-5 


= 1-25 
0-5 


r = 9 


VI -25 


V9 = 1-34. 


The probability of obtaining such a value or greater (absolutely) is about 0-22, and 
the observed difCeiences are therefore not significant. This is the same conclusion that 
we reached in Example 21.3, but it would not have been surprising had the conclusions 
differed, for they relate to different questions. 


Difference of Means when Variances are. Unequal 

21 . 28 . When population variances are not assumed equal the i-test of difference 
of means no longer applies. We can, if we choose, apply a test based on fiducial intervals, 
namely, the Behrens test, considered in the previous chapter. We put 


Xi — X2 


. (21.33) 


The fiducial limits of d for various significance levels have been tabulated by Sukhatme 
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(19386) and Fisher (1941a) for and greater than 5. If the observed d falls inside the 
range, we may accept the hypothesis that the population means are equal. 


21 . 29 . As we have seen, an inference of this kind does not imply that we shall be 
correct in a certain proportion of the cases, and if we wish to find a test satisfying such 
a criterion a different approach is necessary. The following investigation is due to Welch 
(19386). 

Consider the distribution of u of equation (21.32) when the means are the same but 
the variances are different, i.e. 


Put 


Xt 




u = 


Sf + S| 


7h\ “b 




. (21.34) 


. (21.35) 


( 21 . 36 ) 


where of Xi — hence xf is distributed as with — 1 degrees of freedom, 

and similarly for xl- y! regarded as a single normal variate with zero mean and 

unit variance. We have then 


Now put 

where, from (21.36), 


•u = 


1 


^/w 


IV = axi + bxi, 


a 


err 




(^2 




1 


-I- 


1 1 


Wo 


fI 

I + 1 

*■> O 

^ 

Wa 


(21.37) 

(21.38) 


(21.39) 


w itself is not distributed in the Type III form unless Ci = a,, but we will find a distribution 
of that form which approximates to it by equating lower moments. The first two moments 
of w, being the sum of the separate parts, are 

(w) = avt + bv-i I 


The moments of 


/^2 (w^) = 2 (a^ Vi + 6^ Vz)j 


(21.40) 


dF = 


(2g)i^ r i^v) 


g— W/2(7 (lyj 


are . 


= gv 

/la = 29'^ V 


. (21.41) 
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Identifying (21.40) and (21.41) we find- 


9 


Vj, + 6^ j'2'1 

avi + dvs 
(avi + 6 ^ 2 )^ 
Vi + 6 ^ V 2 J 


(21.42) 


With these values of g and v the distribution of w/g is approximately of the Type III form 
with V degrees of freedom and will be independent of %'• Hence, 


^=;t' / 

jw \J 

V 9 


w 


uVigv) 


(21.43) 


is distributed approximately as “ Student’s ” t with v degrees of freedom. In particular, 
if ffi = (Ta, a = b and we reduce to the test of 21.26. 


21.30. In general, when a 2 the quantities g and v depend on 

0 = (Ti/cr'^. We have 

(vi 0 -f Vq)'^ 


V 


VlO^ -|- V; 


uiic; leiuiu 


(21.44) 


and may put u — ct where c = l/'\/vg, and hence 


(^'i + '>' 2 ) 


c =< 


0 1 

_ _] 

^2 


1 4 - I 


(■)q 0 + V2) 


(21.45) 


Without a definite knowledge of 0 we cannot apply the ^-test, but the advantage of putting 
the expressions in this form is that by considering particular values of 0 we are able to 
judge how far the test based on “ Student’s ” distribution is likely to be affected. 


Example 2J.() (from Welch, 1938^)) 

Consider the case = 10. From (21.45) we have c = 1 and from (21.44) 

_ _ 9 (0 + 1)2 

''' Q-i q, I 

Suppose now we were to use the test of 21.26, based on the assumption that 0 = 1. We 
should find, to a probability level of 0-05, that | u \ must exceed 2-101 to be significant. 
If we judge u significant for such values how far are we in error when 6 is not unity ? That 
is to say, what are the true probabilities that 

P{lwl> 2-101} 

for varying values of 0, as compared with our value of 0-05 ? 

For a specified 0 the probabilities can easily be obtained from the approximate dis- 
tribution u\/{gv) of equation (21.43). They are shown graphically in Fig. 21.1. The full 
line (a) shows P for various values of 6 and = Wa = 10. The full line {b) shows similarly 
the values for = 5, ^^2 = 15. (The dotted line (c) we refer to below.) 

A.s. — VOL. 11. 


I 
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0-3 h 


0-2 h 


Values 
of P 


In case {a) the line 
does not deviate very 
much from the horizontal 
at P = 0*05, and we may 
conclude that the test 
based on the assumption 
of equal variance is not 
very much in error. In 
any ease, if the curve 
falls below the line P = 

0-05 we are on the safe 
side, for our true proba- 
bility is then less than 
0-05, and in rejecting the 
hypothesis at that level 
we are adopting more 
stringent standards than 
is apparent. 

In case (6), when the 
sample numbers are un- 
equal we have a different 
state of affairs. For 0 < 1 the test is very conservative, but for 0 > 1 it may err very 
seriously in the wrong direction. 

21.31. Welch concludes that for samples of equal size there is not a serious likeli- 
hood of error in testing the difference of means as if the parent variances were equal. For 
samples of unequal size the error may invalidate the i-test and an alternative criterion is 
proposed. Write 

Xi — 


O-f h 


O'OS F 


0-0 



00 ! 


0-10 hO 10 

Values of 0 (looarLthmic scale). 
Pig. 21.1. 


/OO 




+ 


8‘i 


(21.46) 


the standard 


(Wj — 1) ' Uz (Wa — 1), 

( 2 2 \ -A 

^ -f ^ I 

Ui ) ’ 

deviation of the difference — x^. Precisely as for % we approximate to the distribution 
of this denominator by a Type III form. Corresponding to (21.39) we find 


a 


— 1 ) 



(21.47) 


^^2 ('^2 ~ 1) / \%i n-i 

Corresponding to (21.45) we find c = 1, and to (21.44) 


V 


/I+IV// 1 \ 

n^) / \nl (Ui — 1) {n^ - 1 ) / 


. (21.48) 

V is then distributed approximately in “ Student’s ” form with v degrees of freedom. The 
dotted line (c) in Fig. 21.1 shows the relationship between 6 and P { | «; | > 2-101} for 

Til = 5, ?^2 = 15. Clearly the error is now much smaller than when we used u for the same 
sample numbers. 
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Difference of Two Variances in Normal Samples 

21.32. If we have samples of n-^ and n^ members from normal populations with 

variances u? and Oo, the ratio of sample variances ~ distributed in the form (cf. 

^ 6o 


Example 10.18, vol. I, p. 249)- 


dF oc 


j- - 

Ol <72 


pni -2 

^ \i(Wi+n2-2)‘ 


The related quantity 


is distributed in Fisher’s form 


1 1 (^2 1) 2 


dF oc ~ 


e’’^® dz 

vx Y 

h ” ' J2, I 


. (21.49) 

. (21.50) 
. (21.51) 


where - 1, = %2 - 1. The v’s may, by a convenient extension of our previous 

terminology, be called the degrees of freedom associated with z. In practice, z is generally 
used in preference to p, but tables of both are available. *22 

These distributions provide a test of significance of the equality of the ratio oja^. 
On the hypothesis of equality they are independent of the ratio and. the probability of 
an observed p or z can be obtained. As usual, if this is small we reject the hypothesis. 
We leave it to the reader to show that this type of inference can be based on the theory 
of confidence intervals or the theory of fiducial intervals in the usual way. 


Example 21.7 

In Example 21.4 we had two samples of children and found that the difference in 
means was not significant. This was on the hypothesis that the variances were identical, 
and since the two samples are equal in number the inference remains valid even if the 
variances are different, as illustrated in 21.31. We will now test directly whether the 
sample variances themselves indicate any significant difference in parent variances. 

We have 

E {Xx — Xi)^ = 9-40 Vx — 9 

2" (iC2 - = 3-90 7-2 = 9. 


Hence 



loge 


9-40 

“’ 9 ” 


^0 


0-4398. 


From Appendix Tables 4 and 5 of Vol. I (pp. 442-3) we see that for - 
points of z are 

Vx = 8, 0-5862 

Vx = 12, 0-5613 


9 the 5 -per-cent 


and the 1-per-cent, points are 

,,, = 8, 0-8494 

Vx = 12, 0-8157. 

Thus, notwithstanding that one variance is about 2^ times the other, the probability that 
the observed z will be exceeded on random sampling from populations with the same 
variance is greater than 0-05, and the difference of sample variances is not significant. 
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There is a point here which is frequently overlooked. In carrying out the a-test we 
always take the ratio of the larger variance to the smaller, so that our probability levels 
relate, not to the chance that a given pair of variances have a larger ratio than the observed 
one, but to the chance that the bigger of the two exceeds the smaller in a certain ratio. 
A probability of 0-05 thus relates to the chance that either sf/sl exceeds a given amount 
Ic, or Si/s| falls short of a given amount \/k. If we are interested only in the former 
contingency our probabilities should be halved. 


Properties of .Fisher’s Distribution 

21.33. The ^-distribution plays a very important part in statistical inference based 
on small samples, and we digress at this point to give an account of its main features. 

The distribution function of z may be obtained from the incomplete jB-function, for 
z may be easily transformed into a Type I variate. There are, however, special tables 
for lower values of Vi and and satisfactory approximations of various kinds for higher 
values. 

The characteristic function of z is proportional to 

J _.o + 1^2)- 

where 6 = it, and is thus 


fit) 


1'2 

7l, 




pf^’l 


V., 


2 / ^ V 2 


Thus, taking logarithms and using the expansion 


log JT ( 1 + ^’) = i log 27r 4- (iC + 1 ) log a; — a; + 


12 a; 


we find 


log </>(«) = - 


Vi — ii + i 

2\i’i V 2 / 4 V ri i'2 


Thus, for large and iq, s is distributed normally with mean 


^ f 1 1\ . . ,/ll 

i. I I variance ( - H — 


V-i V., 


7’ 


V 


(21.52) 


(21.55) 


21.34. Various approximations have been given for the case when r, and v., are 
not large enough to justify the assumption of normality. 

(a) (Cornish and Fisher, 19.57). The method is that of 6.32 and depends on the 
expansion of the distribution in a Gram-Charlier series. From the successive derivatives 
of log r {1 x) we can find those of f [t), and hence ascertain the cumulants of 2. Writing 

ri = ~ and r^ = — , we find 

i (^'i — ^2) -- k (^1 7 ^1) 

Kj = ir I -j- r 2) -f- (r‘j + r|) + 3 (rf + r:]) 

«3 = 7 i (^1 — ^1) — (rf — r|) 

K 4 = rf + r| -f 3 (r^ + r'|) 

K 5 = — 3 (rf — r|) 

Ka = 12 (rf + ri) 


. ( 21 . 54 ) 
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Hence, putting a = r. + and ii = r, - we find for the i’s of 6.32 (m - 0, 

variance = lor) — 


h 


7! 


(I 5 + I <5cr) 


+ -6- 3 


and so on. After some reduction we find, for the value of corresponding to a probability 
a (which in turn corresponds to a normal deviate I) — 




2 


Id (f‘-^ + 2) + 


(^3 4- 31) + i- + 111) 

2 I 24^" ^ ^ 72 CT 




a a‘ 


120 


2 1 1920 


(|5 4_ 20P 4- 151) 




(21.55) 


2880 


(t) (Fisher, extended by Cochran, 194(to). Writing » indifferently for r, and we 

have, from (21.55), to order rt '•' — 


U {P h 2) + 


'|;|^(j3 + 3.) + i?7(i= + iif) 


Put h = 2/flr. Then 


1/5(1“ I- 2) 1 4 


■ I A 31 


+ 11^ 


Now 

Hence, if we |)ut 


' '' ' Vh\ 12/^^ 


_ 4- +0(w ‘‘^). 

-s/'{h - X) \/f>' 2//V/'' 

^5 (f A 2), 


hd‘^ 

144 


(21.56) 


(21.57) 


^{h -- A) 

the difference of this (quantity from (21.56) is 


(^3 4.. 1 1^) 

144 


1 2 4_ 3 
'o 


provided that we take A ^ 

The difference is small in virtue of the large denominator and the factor d — ^ J 

which is small if r. and r, are not too different. Thus we l“ d are““^**'^ 

given by (21.67). The values of A for various values of the significance level are 

Level 40% 30% 20% 10% 5% 1% 0-1% 

1 0-61 0-66 0-62 0-77 0-9.5 1-40 2-09 
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For the commoner levels of significance the form taken by (21.57) is 

0 - 8416 

20 per cent, level: _ — 0-4514(5 . . . (21.58) 

y\fi — A) 

1 - 6440 

5 per cent, level: --■= -r — 0-7843(3 . , . (21.59) 

^/{fi — A) 

1 per cent, level : - — T235(5. . . . (21.60) 

V(/i. — A) 

0-1 per cent, level : y--- — - - — 1-9255. . . . (21.61) 

V(^ — 4) 

The accuracy of the approximation for Vi — 24, = 60 may be judged from the following 

comparison : — 


Level 
per cent. 

Value of z from 
(21.57). 

Exact Value. 

20 

0-1337 

01338 

1 

0-3748 

0-3746 

0-1 

0-4966 

0-4955 


(c) (Paulson, 1942). The Wilson-Hilferty approximation to of 12.7 indicates that 

/ ' 2 2 s'^ 

— is distributed normally about mean 1 — — with variance — . The ratio -1 itself 
\ V / 9v Qv 

is the ratio of two independent quantities distributed as x“ wifh Vi and degrees of free- 


dom. Further, in virtue of Geary’s theorem (VoL I, p. 253) the ratio 

normally distributed in standard measure. 

We may thus regard 


nix ~ ^2 p 

(af -f- 


2 1 i 


. ( 21 . 62 ) 


as approximately normally distributed in standard measure. The approximation seems 
remarkably good. For instance, the following shows the exact and approximate values 
of for Vi — 6, V 2 = 12. 


Level 
per cent. 

^*4^ ^ from 

(21.62). 

Exact Value. 

20 

1-72 

1-72 

5 

3-00 

3-00 

1 

4-85 

4-82 

0-1 

8-58 

8-38 
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The Problem of h Samples 

21.35. We now proceed to consider the case when we have samples from k different 
populations and wish to determine whether there is any evidence of significant differences 
between those populations. In some cases the appropriate test can be carried out by the 
^^-distribution, particularly if the data are grouped. For the groups may then be regarded 
as determining the rows of a contingency table and the different samples the columns, and 
a homogeneity test applied to the table in the manner of Chapter 12. Again, we may 
compare the samples pair by pair by the foregoing methods ; but this, apart from being 
tedious, does not give us what we want, namely a test of homogeneity of the set of samples 
taken together. 

21.36. Consider in the first instance the sampling of attributes. Suppose we have 

samples from populations in which the true proportions of successes are w, the observed 
proportions being Pi • • > Pk sample numbers . . . n^, totalling n. 

lip is the mean proportion of successes in all samples taken together, and our hypothesis 
is that the populations have a common value, p will be an estimate of m and we have for 
the variance of pj — 


where 


varp^ = 


n, 


= — approximately, 




(21.63) 


It follows that {pj — 
variance, and hence 



will be distributed normally about zero mean with unit 



Ejn^iPj - pY] 
pq 


. (21.64) 


in the Type III form with k — 1 degrees of freedom (not k because we have lost a degree 
by estimating p). Hence the ratio 

^ I Uj {pj — p)- ^ ^ ^ ^ (21.65) 

pq {k - 1) 

has expectation unity. The quantity Q is called the Lexis ratio, after the author who 
first discussed it in detail (Lexis, 1903).* 


* Lexis first developed the use of Q in a paper “ tj’ber die Theorie der Stabilitat statistischer Reihen,” 
1879, Conrad's Jahrhucher, 32, 60, reproduced in the reference given above. He dealt, however, only 
with the case when all the w’s were equal and had no knowledge of the sampling distribution of Q. In 
practical applications he took as each Uj the average for the group. “ Der dadurch begangenen Fehlei 
kann man beurteilen wennman n einmal mit der grossten imd einmal mit der kleinsten Grundzak 
berechnet.” 
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Example 21.8 

From 1910 to 1919 the numbers of live male and female births in England and Wales 
were as follows : — 


Year. 

Male Births. 

Female Births. 

Total Births. 

Proj3ortiou 

Mat'/Total 

1910 

457,266 

439,696 

896,962 

0-5098 

1911 

448,933 

432,205 

881,138 

0-5095 

1912 

445,004 

427,733 

872,737 

0-5099 

1913 

449,159 

432,731 

881,890 

0-5093 

1914 

447,184- 

431,912 

879,096 

0-.5087 

1915 

415,205 

399,409 

814,614 

0-5097 

1916 

402,137 

383,383 

785,520 

0-5 119 

1917 

341,361 

326,985 

668,346 

0-5 lOS 

1918 

339,112 

323,549 

662,661 

0-5117 

1919 

356,241 

336,197 

692,438 

0-5145 

Totals 

4,101,602 

3,933,800 

8,035,402 

0-5104 


The proportion of male births showed an increase during the war years 191() 1911). 
This is a well-known effect of war, but suppose we had noticed it here for the first time. 
The natural question is : can the effect be accidental ? There is no doubt about its rcalitt/. 
for the data cover the whole population ; but if we suppose that sex at birth is distributtnl 
according to the laws of chance, do the differences observed suggest that in tlu^ ten \'(Nirs 
concerned there was a significant change in the population (as regards ])roportion of malt^ 
births) ? Let us consider the homogeneity test applied to the 10 proportions. 

We have p = 0-5104, n = 8,0:15,402, k - 1 = r =- 9 and the sum A/q {pj />)“ " ill 

be found to be 19-895, 78.*h Hence 


Q = 



19-895,783 
X 0-5104 X 0-4896 


2 


-974 


== (/^: _ 1) Q‘^ ^ 79-618. 

Q is sufficiently far from unity to reject decisively the hypothesis that the data are homo- 
geneous. A ;(;2-test will confirm the conclusion. We infer that, whatever the reason, 
the differences in proportions of male births, slight as they are, cannot be accounted for 
on the supposition that the distribution of sex is according to chance in sami)l(‘s from 
a constant population. We may observe that, had we obtained the same pro})ortions 
for a sample one-tenth the size, would have been 7-962 and we should not have inferred 1 
non-homogeneity. 


21 .37. A similar test may be applied with k samples of variables. 


‘'^115 

^ 12 ? • 


with 

mean 


^2 1 7 

'^2 2 5 • 


5 5 

? ? 

^2 

» 


* • ^hnii 


J 5 

Xk^ 


The variance of the jth. sample is 


Let the sainj[)les be 
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and an estimate of the population variance may be obtained by taking the weighted mean 
of sample variances 

si = --J— E S {x.i - ( 21 . 66 ) 

n — k j I 

Here we have reduced the divisor ton — ^ so as to correspond with the number of degrees 
of freedom. 

<7^ 

Furthermore x^ will be distributed with variance — • and hence (assuming without 

% 

loss of generality that the parent mean is zero), 

k 

E ^ {uj {Xj — x)^} — E {E (Uf Xj) — E (nx^) } 

= ko^ — < 7 ^ 

= {k ~ 1) o-^ 

Putting then 

= . . . .(21.67) 

tC — i j 

we have another estimate of Within sampling limits and rS‘„ should be equal. If 

they are not, we suspect the homogeneity of the population. 

21.38. The above test is a simple form of the analysis of variance, which we shall 
study extensively in Chapters 23 and 24 ; it is therefore unnecessary for us to develop it 
further at the present stage. Essentially the test is one of simultaneous significance of 
differences between means on the assumption that variances are constant. We shall also 
discuss in Chapter 26 a generalisation of the variance ratio for testing the homogeneity 
of a set of variances. 


Example 21.9 

The following table (from the Registrar-General’s Statistical Revieiv of England and 
Wales for 1932, Part II) shows the numbers of males married in England in that year 
classified according to age and district. (Certain small numbers of unspecified age and 
those under 21 have been omitted.) 


1 — - 

- - - • 





■ ■ 






Ago (Years). 











l?OTALS. 

District. 

21- 

25- 

30- 

35- 

45- 

55- 


.... 

Soutli-l^ast . 

31,714 

43,979 

14,995 

7,985 

3,928 

3,717 

106,318 

North. 

31,507 

39,849 

13,620 

7,108 

3,362 

2,916 

98,362 

Midland . 

17,465 

21,486 

6,729 

3,340 

1,624 

1,609 

52,163 

Nast .... 

4,016 

5,297 ’ 

1,820 

962 

457 

386 

12,938 

South-West . 

4,323 

6,065 

2,218 

1,177 

514 

580 

14,877 

Totals 

89,025 

116,676 

39,382 

20,572 

9,886 

9,108 

284,648 


Note the changes in interval at 25- and 36— years. 
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The question we shall consider is whether age at marriage differs significantly between 
different districts. This might, for example, be an important point if we were about to 
sample the population for some quality related to age at marriage, such as the number 
of children per family. The data might he regarded as a contingency table and used 
as a test of independence in the usual way. Here we adopt an alternative by considering 
the mean age at marriage in the five different districts. 

Taking the centres of the intervals to be 23, 27-5, 32*5, 40, 50 and 57-5 years (the latter 
being admittedly an approximation) and making no corrections for grouping, we find ; — 


District. 

Number. 

Mean 

(years). 

Sum of Squares 
of Deviations 
from Mean. 

Variance. 

South-East 

106,318 

29-681,799 

7,092,490 

66-710 

North 

98,362 

29-312,626 

6,092,375 

61-938 

Midland 

52,153 

29-007,344 

3,105,520 

59-546 

East 

12,938 

29-425,761 

807,911 

62-445 

South-West 

14,877 

29-873,731 

1,025,284 

68-917 

Whole population 

284,648 

29-429,049 

18,143,921 

63-741 


The total of the sum of squares about district means, Z {Xji — is the sum of the 
figures in the fourth column, namely 18,123,580. The sum of squares Sn^ {Xj — x)^ is 
found to be 20,341. We have the useful check that these two together are equal to the 
sum of squares of deviations from the population mean, 18,143,921 (a property which we 
shall often require in the analysis of variance). 

Thus 


s:, — 


18,123,580 
“m,64T 
20,341 


63-67 


5085-25. 


No test of significance is required to see that the difference in mean age at marriage between 
districts is not a chance effect. 


Tests of JEtandom Order 

21 . 39 . The tests described above are concerned with the values of a number of 
sample members but not with the order in which these values occur. Sometimes there 
may not be an order, as, for instance, if a number of plants are grown simultaneously or 
a number of names drawn from a hat in a single handful. More frequently there is a tem- 
poral order of appearance in the values, and it is clear that, on some occasions at least, 
the order may be material. To take an extreme case, suppose we are told that in a sample 
of 100 births 53 are male. We conclude that the sample is concordant with the hypothesis 
that male and female births occur at random with probability But if we knew in addition 
that first 53 births were male and the next 47 female we should almost certainly reject 
the hypothesis. 

21 . 40 . If sampling is conducted by taking members one at a time from a population 
and the process is random, then any order is as probable as any other order. The sample 
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may be considered as a section of an infinite series generated by the sampling process, and 
this series ought to behave like von Mises’ Irregular Kollektiv ( 7 . 15 ). It is a happy 
hunting-ground for the theorist, since there is no limit to the number of tests which can 
be invented to ascertain whether a given finite series conforms to the random scheme. We 
have considered a few such tests in connection with random sampling numbers ( 8 . 15 ) 
and shall discuss others in connection with time-series (Chapter 30). Here we discuss a 
few tests which are useful in detecting departures from randomness in the sampling. We 
are not now considering hypotheses as to the parent population, but since the randomness 
of the sampling is an essential element of inferences in probabihty it is convenient to 
consider the reliability of the sampling, together with inferences from the sample about 
the parent. 


Ranking Tests 

21 . 41 . Suppose we have a sample of n members Xi . . . in that order, and are 
doubtful about its randomness. Such doubts may arise owing either to defects in the 
sampling or to possible alterations in the population while the sampling is going on. In 
the first case the process itself is at fault ; in the second, circumstances are at work to make 
the sample something other than it purports to be, a random sample from a single popula- 
tion. Either influence may relate the magnitude of the cr’s to the order in which they 
occur, and the values Xi . . . are not then a random order in the sense that any other 
order was equally probable. 

Let us then consider all the possible orders, n ! in number, of the observed values 
aji . . . a;,,. A proportion of these, determined by a significance level of 5 per cent, or 
1 per cent., say, we will decide to reject as improbable ; and we will select as the “ improb- 
able ” rankings those which exhibit the systematic appearance of which we are afraid, 
and particularly the regular rise or fall from to in magnitude. In short, we rank the 
sample in order of magnitude, say Xi . . . X^, where the X’s are a permutation of 
the first n integers, and compute a rank correlation coefficient between this order and the 
order I ... 7i. If the coefficient is large in absolute value (“ large ” being determined 
by the significance level) we suspect the sample of being subject to systematic influences. 


Example 21.10 

Thirty persons in the income group £1000-£1500 are asked to supply returns of their 
annual income for some purpose connected with taxation. It is intended to summarise 
their replies by a given date, but when that date arrives only 20 answers have been received. 
This is a frequent event in postal inquiries, even when the return is compulsory, and it 
has to be decided whether the 20 returns may be accepted as representative of the 30. 
There are prior reasons for suspecting that persons with bigger incomes may delay more 
than the others, partly because of difficulty in completing returns and partly because of 
a natural reluctance to part with information which may tell against them.*. We there- 
fore wish to ascertain from the 20 returns whether there is any evidence that persons with 
smaller incomes tend to submit returns earlier than those with larger incomes. 

Suppose the 20 returns give incomes, in that order, of £ per annum ; 1180, 1270, 1400, 

* This is an assumption for the purposes of the example and not intended as a statement about 
taxation returns in real life. 
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1090, 1190, 1250, 1170, 1300, 1290, 1310, 1280, 1350, 1320, 1380, 1420, 1390, 1470, 13(50, 
1220, 1460. The ranking order is — 

No. of sample . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 

Bank 3 7 17 1 4 6 2 10 9 11 8 13 12 15 18 16 20 14 5 19 

Difference — 2 — 5 — 14 3 1 0 5—2 0—1 3—1 1 — 1 — 3 0 — 3 4 14 1 

The sum of squares of differences is 508 and thus the Spearman coefficient of rank 
correlation between observed and natural order 1 is 


p = l- 


6 X 508 


= 0-618. 


The probability of obtaining such a value or greater (16.18) may be found from 
distribution by putting 



Student’s ” 


V = 18, 

and is found from Appendix Table 3, vol. I, to be about 0-004. The test confirms our 
suspicion that size of income is correlated with order of appearance, and if we intcuid to 
use the mean income of the 20 returns as an estimate of the income in the full 30 we must 
recognise that it may very well be an under-estimate. 


21.42. It will be noted in this example that we have made no assumption a.bout/ 
the distribution of incomes in the sample or the population (the latter of wliicli would 
certainly not be normal) and have used the sample values themselves without any r(vf('r(Mic(^ 
to the question whether they were representative. This does not invalidate our infcnmce, 
which is made within the population of samples obtained by permuting the obsoi'ved values. 
(Cf 17.44 and 17.45.) 


21.43. A second test of use in random series, particularly when it is siispectc'd t hat, 
cyclical effects are present, may be obtained by counting the occurrences of “ p(‘a,ks ” or 
troughs in the series. A member is said to be a “ peak ” if it is greatei' than tlu‘ t wo 
neighbouring members, and a “ trough ” if it is less than those members. In (utlu'i- (^asc 
it is a “ turning-point ”. The interval between turning-points is called a “ phases ". 

Three consecutive observations are required to define a turning- f)oint. If the se-ries 
is random the probability that any given three provides a turning-point is rj, for th<> vahu's 
Xx, x^, Xs may occur in six orders and in only four is the greatest or least value tlui mid(lh‘ 
one. In a series of A terms there are N — 2 sets of three, and hence the exj)ect(id nuinlxa’ 
of turning-points p is 

A(_p) = |(A-2) (21.(58) 

The variance and higher moments of p are not so easy to determine. Like the ranking 
problems considered in Chapter 16 (to which the present problem is analogous), the dis- 
tributions resulting are rather complicated. We quote without proof the results 

ip) = — (21.(59) 


ys ip) = 


16 (A -f- 1) 
945 ~^ 


ip) 


448A3 - 1976A 4 - 2301 


(21.70) 


4725 


. (21.71) 
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As N tends to infinity the distribution tends to normality fairly rapidly, and p may, 
for finite N, be taken as normally distributed about mean | {N - 2) with variance 

16 A - - 29 


. (21.73) 


21.44. A further test may be derived from the distribution of phase ^engths. The 
probability of a phase of length d in a series of d 1 terms is clearly j) i’ 

two of the possible permutations are favourable. In a series of length A there are 
7 -12 possible iLses of length d, for ^ + 3 points are required to determine the 

nhase. Tlie probability of a phase d in + 3 terms is 

r I 1 I „ J 1 _ \ 1 = 

\ {d -1^ 1) '• + 2i) 1 J I “k “k • J {d ■-]- 3) . 

and hence the number of phases of length d is 

. 2 (A - - 2) {d‘^ + 3d + I) . . (21.73) 

^ • id + 3) '! 

Now the number of possible phases is 

^ + ill} 

for thc-o is m,o fewer phase turnmg-poi^, J 2)^^ 

a series ofk we then have (21.73) divided by (21.74), 

ft 4._ u -I- 1) ^ .... (21.75) 

(^I ■+-■':)) T'(2A - 7) 

moments of this distribution are easily obtained to a very close approximation. 

' « 'V,' ^N ~ d ^ 2) (dl+Jd +l) 

/'i (d) — 2iV - 7 A ' (d + 3) ■ 

(I I 

<> V r (AT - 2) { [d -f 3) {d + 2) {d -f 1) - 3 {d 4- 3) {d + 2) + 5 {d + 3) - 3} 

2iV V 

ft r . f 1 .n - - - \ 

^ .>A 7 '"If/!' (d 4- 1) ^ 

■ 1 3 H « 

- - ... 1) 1 d ! ““ {d + 1) ! "^ {d + 2) ! {d + 3) 1 J 

Remembering the rapid convergence of S t to c, we may write this as 

r (A - 2) {e — 1 — 3 (e — 2) 4- "■ li) — 3 (e — g) f 

n^xrrrLV 't „ Q\T 


6 

2A - 7 


^ 3 (e _ 1 ) - 8 (e - 2) + 13 (e - f ) - 9 (e - f) ]• 

3 (A 4- 7 - 4e) 3 

2A~7 ^2 • ■ ■ 


. (21.76) 
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Similarly we find 

(d) = (23^ 7)2{(*« - 21) - (48e2 - 140e + 14) } ~ 0-560. (21.77) 

21.45. In comparing observed distributions of phases with expected values the 

ordinary ;^^-test cannot be applied, because the probabilities of the events in a finite series 
are not independent. A test of significance has been derived by Wallis and Moore (11)41), 
who consider a grouping into three categories, d = 1, d = 2 and d > 3. They conclude 
that calculated from these three groups can be tested in the usual Type II I form 
with V = 2^ if x^ ^ 6*3. For lower values tested in that form with v -- 2. 

This test is independent of the law of distribution of the variables and is thus ol general 
application. It has to be remembered, however, that generality in these matters may 

be offset by loss of sensitivity, and more searching tests may be required in certain cases. 


Example 21.11 

The following table shows the deviations from a moving nine-year average of |)otato 
yields in England and Wales for the years 1888-1935 (units are -^th ton) : — 


Year. 

Yield. 

Year. 

Yield. 

1 

Year. 

Yield. 

i 

Yc'ar. 

i 

Yield. 

1888 

- 6 

1900 

-IT 

1912 

- 15 T 

1 924 ! 

i 

1 

1 T ! 

89 

+ 2 P 

01 

-1- 6 P 

13 

h 3 P 

25 

2 r : 

90 

- 4 T 

02 

- 3 

14 

2 

26 

9 T 1 

91 

- 3 

03 

-IT 

15 

-1- 1 

27 

- - 3 ‘ 

92 

- 1 

04 

+ 2 P 

16 

— 2 T 

28 

1 9 /’ ; 

93 

f 6 P 

05 

0 T 

17 

■V 5 P 

29 

i 5 ; 

94 

- 2 T 

06 

-h 1 P 

18 

-1- 4 

30 ; 

r 1 1 

95 

-VIP 

07 

-IT 

19 

- 4 2’ 

31 

Id T 

96 

-f 3 

08 

4- 8 P 

20 

- 3 P 

32 

1 • : 

97 

- 6 T 

09 

4- 4 

21 

- 9 T 

33 

■1 2 1 

98 

4- 2 P 

10 

4- 3 P 

22 

-1 1 1 P 

34 

i 1 •"> 1 

99 

0 

11 

4- 4 P 

23 

- I 1 

35 

: ■ -1- ! 






.[ 




We have marked with P and T the peaks and troughs of the series. The observed 
number of turning-points is 31 in a series of 48 terms. The expected number is, from 
(21.68), -| (48 — 2) — 30-67, almost exactly the number observed. No test of siguili<^ari(!(? 
is required. 

The duration of phases is : — 


d = 1 
2 

3 and over 


Observed 


20 

6 

4 


Predicted (2l.7r)) 
18-75 
8-07 
3-18 


30 30-00 

Here, again, a test is hardly necessary. We find, in fact, x^ = 0-826, '■) of whitdi for 
r = 2 is not significant. 

We conclude that these tests provide no evidence against the randomness of tiie scries 
and hence do not suggest any cyclical movement in the yields. 
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21.46. In the foregoing example we have treated the two values in 1923 and 1924 
as a single value since they are equal. These so-called “ ties ” frequently occur in ranking 
work and are a great nuisance. In the present case there is only one, and any reasonable 
method of treating it will not affect the test. Where “ ties ” are numerous enough to 
make a serious difference some systematic method of treating them is desirable, particularly 
if more than two individuals are tied. They may be treated as a single observation, as 
in this case (although it would probably be better then to reduce N accordingly) ; or, 
preferably, they may be counted as a mean value, e.g. with a tied pair we should consider 
the first as greater than the second and then the second greater than the first, counting the 
number of turning-points or phases as one-half in each case and adding the two together. 
This, as in all similar ranking problems, makes the theoretical discussion of sampling very 
complicated, and if it is desired to make a precise use of significance tests a further possi- 
bility is to assume that the tied members are ranked in the order most unfavourable tO’ 
the hypothesis under test, so as to be on the safe side. 

Conditional Tests 

21.47. When -several unknown parameters are concerned, it may be difficult to find 
a sampling distribution dependent only on one of them which will form a basis for estimation 
or a test of significance. Sometimes, however, we can get rid of undesirable parameters 
by restricting the distribution in some way, and particularly by considering a distribution 
of samples which have some specified quality in common with the observed sample. Such 
distributions we shall, in Bartlett’s phrase, call conditional. Fisher expresses a similar 
idea by speaking of samples which have the same configuration. 

The most important application of this principle is in the testing of regression 
coefficients, which we shall consider in the next chapter. Here we give a simple illustration 
of the method for the Poisson distribution. 


Example 21.12 

Suppose we have two samples from jjopulations which are known to give the Poisson 
type of distribution but may have different parameters. We wish to determine whether 
the populations could be identical. 

Suppose the frequencies of successes in the two samples are r-y and If .A is the para- 
meter of the parent (assumed the same for each), the probabilities of the samples are 




and 



and their joint probability is accordingly 





C-2A 

Ty ! fa ! 


( 21 . 78 ) 


This depends on 1 and does not help us in answering the question. However, for the 
probability of a sample with -f- fa successes we have (since the sum of two Poisson variates 
with parameters Xy, X^ is distributed in the same form with parameter Xy + X^ : — 


P {ry r^\ X] 


e~‘" (2A)’-‘+»'» 

{Ty -H fa) ! ’ 
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and hence 

P (rj, r2 I A} _ (y-i + ^a) ! _ ^ ! 

P{ri 4- ^2 I fi ! n 2»- ri ! 7-2 ! 

where r — r-^ + ^ 2 - 

Now in accordance with Bayes’ theorem we have 


and hence 


P {n, ’^2 I = Pi'Ti, r^l^i + r^} P {ri + | A} 


P {Ti,r^\T} 


2 »‘ r-,\r^\ 


(21.79) 


(21.80) 


Consequently, if we confine our attention to samples for which the total number of successes 
is r, the probability of the observed r^ and r^ is independent of A and is, in fact, the corre- 
sponding term in the binomial (^ + The probability is clearly that of a partition of 
r into the observed r^ and r^, and if it is small we suspect the hypothesis that the samples 
emanated from the same population. 

This kind of conditional inference raises the same sort of point as we noticed in 17.44. 
We decide beforehand that, whatever r turns out to be, we will make the inference in the 
population of samples which yield that value of r. 


Pitman’s Tests 

21.48. In the extreme conditional case we may consider an inference in a population 
of samples the members of which are the same as those actually observed, the population 
being given by permutations or partitions of the observed values. The tests of ranking 
and periodicity given above are cases of this kind. A similar procedure has been advocated 
by Fisher in the analysis of variance and the design of experiments, and will be considcu-c'd 
in due course. We now proceed to examine tests of the same nature proposed by Ihtman 
(1937a, 1938). 

Suppose we have two sets of values u^ . . . and . . . n,, with means u and v 

and the mean of the two together equal to z. Given 7n 4 - n objects, there are 

ways, say N, of separating them into two sets of m and n objects, of which the given set 
is one. We call | u — v \ the spread of the separation. Since 

mu nv — {m n) z, 

we have also for the spread 

{m n)\u — z \ __ {m n)\ E {u) - mz \ 


Take a probability 1 — a = M/N, where M is an integer. If jR is a particular separation, 
and the number of separations with spread not less than that of R is not greater than M , 
we call R discordant. If there are M or more with a greater spread we call it concordant. 
A separation which is neither concordant nor discordant is called neutral, li m — 71 the 
separations occur in pairs with equal spreads, and we then take M to be even, 
discordant separations are most easily picked out as those with the largest values of 
\Eu ~ mz |. 

If the observed separation is arrived at by chance, the probability that it is discordant 
is M/N = 1 — a when there are no neutral separations. If such exist, the probability 
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is less than 1 — a. Similarly the probability that a separation is concordant is 1 — a, 
or more, as the case may be. 

Two samples Ui . . . and Vi . . . are said to be discordant, concordant or 
neutral according as the separations u and v are so. Having selected our significance 
points dependent on a, and hence having fixed M, we can find for what values of the spreads 
a pair of samples is discordant or otherwise, and hence whether our observed pair is so. 
If they are discordant we reject the hypothesis that they came from the same population. 

Example 21.13 (Pitman, 1937a) 

Two samples have the following values : — 

0 , 11 , 12 , 20 

16, 19, 22, 24, 29. 

Are they significantly different ? 

^ = 126 separations into samples of 

five and four. We take a to be as near as possible to 0-95, corresponding to a 5-per-cent, 
level of significance, and hence = 6. We then find the groups which have the largest 
values of the spread. We have s = 17, so that mz = 68, and using the form ! Z w — 68 [ 
we find those groups of four from 

0, 11, 12, 16, 19, 20, 22, 24, 29, 

which give the maximum value to this quantity. They are — 


\Eu - Q8\ 

0, 11, 12, 16 29 

0, 11, 12, 19 26 

0, 11, 12, 20 25 

29, 24, 22, 20 27 

29, 24, 22, 19 26 

29, 24, 20, 19 24 


The group 0, 11, 12, 20 gives the fifth largest spread, and so with M — Q the observed 
separation is discordant. Our inference is that the samples come from different popula- 
tions. Only in four other cases out of 126 should we get so large a spread in samples from 
the same population. 


There are 9 members altogether and hence ^ 


21.49. The extended use of the above test is barred by practical inconvenience, 
but an approximate form based on a different measure of discordance may be used. We 


now put 


7n {u — z)^ 
{N — 7n) 


(21.82) 


where /<.> is the variance of the samples taken together and is thus a constant. The function 
w is hence linear in {u — 2 )“, the device of squaring, as usual, getting rid of difficulties 
associated with the use of the modulus \u — z [. N here refers to the total sample 

m -f- 71 . 

Now, for the moments of w — 2 we may use the results of 11.26 (vol. I, p. 284), giving 
the moments of the mean in sampling from a finite population ; for 2 is the population 

A.S. — VOL. n. K 
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mean. Replacing n in the formnlae of that section by m and putting N = m + n, 
we have — 

JEiu-z) 

-r^ , Kc N — m 

E {u — zY = -p^ TT — 

' — 1) m 


E{u-zy= ■ - (N - 3) 

and hence for the first two moments of w we find 


(A — Tn) [{A^ N — 6w (A — w.) } ^.4 + 3A^(A wi 1) 1) /^‘i] 


E (w) 

E{w^) 


A - 1 


(1 + 0 ), . 


where 


A + 1 


3 (A — 2) (A — 3) (A — to) 


A (A + 1) 


6 72 + 


A + 1 I ■ 


(21.83) 


(21.84) 


(21.85) 


^2 referring to the measure of kurtosis 

For fixed A the modulus of the second factor in (21.85) will be found to have a maximum 
at ^ when to = ^-A, and it takes this value again at 


A — 2to 


A - 2 
2A - 1’ 


giving = ' or 5 for A = 14 and wider limits for larger A. It will also be found 

that for A > 6 the factor ^ not greater in a.l)solute value than 
2 (A - 2) 


5 ' A — TO 

i.e. unless one sample is more than four times as big as tlie other. J hus for such values 
and 72 not large, 0 is small, and ap])roximately 

K(w‘‘\= ■’ (2I.Kfi) 


Similarly, using the fact that for large m and A 


E (u _ = 1 .3.5 ... (2r - 1) (1 - , 


we find approximately 


(A -1) (A + F) (A^^^ 

The moments given by (21.83), (21.86) and (21.87) are those of the R-distribution 


(21.87) 


dF = (1 — w"- dw, 

J5 (i -|A - 1) ^ , 


. ( 21 . 88 ) 
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which can therefore be used to approximate to the distribution of w. In point of fact the 
diKtribution seems to be remarkably close. 
w may also be written 


m n 


{'ll — vy 


■H {u — U (v — v)^ -\ {u — v)^ 

m -y n 


(21.89) 


whh^h shows that w < 1 . 
We also have 


w ^ ^ ' 

1 — w U {u — uY + 27 (v — v)^ 


(21.90) 


and it is instructive to observe that the function on the right is the same as that of 

, of (21.32) with a few changes of notation. A transformation of (21.88) to 

11 1 ■ j ■ n ^ 


Student’s ” form will in fact show that we can test 


1 — w 


in the ^-distribution with 


V : m - I - n 


2 ; for (21.88) then becomes 


(IF oc 


m -f — 2 


^‘(w+n— 1) 


. (21.91) 


wIhm'c 


1 — w 


(21.92) 


21.50. A test of a similar kind may be evolved for the product-moment correlation. 
Supfxist^ we have two samples Xi . . . and ij^ . . . and calculate 

_ 

^(var X var y) 

lor (‘,v(‘i-y possible pairing of the ic’s and y’s, n ! in number. As before, if we choose an 
r/ and lumcc a, number M such that 1 — oc = M /n ! we may determine those pairings for 
which r is greatest and reject the hypothesis that x and y are independent in such cases 
if t Ik'V fall among the M greatest. Since the denominator of r is constant, this is equivalent 
to attributing significance to the values of \Zxy — nxy \ which exceed a given value 
detc'rmined by oc. 

’Palciug i ^ // = 0, without loss of generality we find 

E (r) =0 (21.93) 

E (r^) = - „■ ^ ^ (27a;y)2 

71^ var cr vary 

_ 1 

" 7i - 1’ 


. (21.94) 
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and similarly, if yi, yz are the modified measures of skewness and kurtosis for x (expressed 

k Jc \ , , 

in terms of ^-statistics, i.e. yj = -A, ya = and y[ and y^ those fort/, it will be found that 

A/a" fCci/ 


® - n (1 -\y 

77 T / 4 X 3 (n — 2) {n ~ 3) 

TO \T^) ■ — — — 1 — — ' y*) yo- 

{n — 1) (w -f i) n (n + i) {n — 1)^ 


Thus to order ^ we have 


E{t)=E (r3) = 0 


E{r^) 


E (r^) =: 


1 

— 1 

3 

{n -1) (?»“+ 1). 


These are the first four moments of the distribution 


(21.95) 


(21.96) 


(21.97) 


(1 - 


1 <x <1. 


Thus r may be tested in this distribution or equivalently, putting 


( 21.8 


*= 

in “ Student’s ” form with v = n — 2. 

In particular, if the numbers x and y reduce to rankings, we have the test already 
introduced in 21.41. Compare also the result given for the distribution of Spearman’s 
p in 16.18 (vol. I, p. 401). 


The Combination of Tests 

21.51. It sometimes happens that we have a number of tests of significance, all 
yielding various probabilities, which we wish to express as a single probability. Suppose, 
for instance, that we conduct an experiment five times and that some test, such as that 
of the mean, gives probabilities to the observed deviations of 0-2, 0-8, 0-01, OT, 0-()3. In 
the ordinary way two of these values would be regarded as significant and the other three 
not. What conclusion are we to draw as to the five taken together ? 

Suppose we have k values of the probability, . . , p,.. The distribution of any 
particular p is rectangular, i.e. 

dF — dp 0 < < 1. 

Hence, if a; = — log p the distribution of x is 

dF == e~'^ dx, 0 < a’ < 00 
and its characteristic function is 

pOO 

J 0 

_ 1 
” 1 - it 
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Hence if we write 

k 

A = - ^ log pp . 

j=i 

the distribution of A has a characteristic function 

1 


( 21 . 100 ) 


^ it) = ; 


and is therefore given by 


(1 — it)^’ 


dF = —— dA. 

r (k) 


( 21 . 101 ) 

( 21 . 102 ) 

( 21 . 103 ) 


Ihitting 

ilif 2 = 2A = — 2Z'log^? = — 2 logiljj 

we sec that the distribution of is 

dF oc gxp (— ^M^)dM 

or M - is distributed as with v = 2k degrees of freedom. 

E.mmple 21.U (K. Pearson, 19336, quoting data from E. M. Elderton, 1933). ^ 

Pairs of boys were selected in various age-groups and one 
on raw, the other on pasteurised milk. The differences m gam m werght^are show 
the following table, together with the standard errors of the diff 
sain[>!o theory. 


(l) 

((’(‘I it nil NiiliK' 
in years). 


r:i 

D'l 


(2) 

(3) 

(^) 


Moan Difference 

Standard 

Nuiubcr 

in Weiglit 

Error of 

nC Piiirs. 

Gained, Haw l(‘ss 

Difference. 


.tastiHirised. 


7‘i 



- 0-066 

0-054 


-1- 0-022 

0-053 

§ \ } 

7 1 

- 0-003 

0-052 

i 1 

77 

■ 1- 0-01 1 

0-055 

i f 

()0 

4 0-002 

0-057 


( 5 ) . 

Probability 
of Observed 
Difference or 
Greater, pk- 


0-8888 

0-3409 

0-5239 

0-4207 

0-4840 


( 6 ) 

logic pl- 


1-9488 

1-5326 

1-7193 

1-6240 

1-6849 


5-5096 


from th(^ ii(.>nual integral. We have ^ 

71^2 ^ __ 2 riogei? = — 


logic e 


6-86 
10 . 


1 f > fi-86 for r = 10 is about 0-74, and the test as a whole 
Tl.e i.rubability ot a value of z- > ® ® ® effect on feeding between the two kmds 

docs not suj.port the hypothesis of a ditterentiai en 

of milk. 
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Nuisance Parameters 

21.52. From the foregoing it will have been clear that in the theories of both estima- 
tion and significance one of the main problems is to find a distribution which is independent 
of certain unknown parameters in the parent population. Parameters of this kind, neces- 
sary as they are in the specification of the parent and the precise formulation of our problem, 
can be a nuisance when we are seeking to make exact statements about some other para- 
meter on which interest is focussed. For this reason they have been named nuisance 
parameters. It may be useful if at this point we summarise the methods available for 
getting rid of them. 

(а) First of all there is the process of “ Studentisation ”, whereby we can remove 
scale parameters from the sampling distribution by a suitable choice of statistic. (Cf. 

19.26.) 

(б) Secondly, we may restrict the inference to a sub-population which is conditioned 
by having certain values in common with the observed sample. It sometimes happens 
that the distribution in this sub-population does not contain the nuisance parameters, 
whereas a distribution in the full population would do so (21.47). 

(c) In the comparison of two samples, or even the testing of a single sample involving 

an unknown mean, that parameter may be eliminated by differencing (21.27). As regards 
the case of the single sample, it is clear that if are independent and n is even, 

the values — X 2 , X 3 — x^, . . . — x^^ will also be independent and be distributed 

with zero mean (though of course there are only of them). 

(d) Transformations of the variate may sometimes either eliminate the nuisance 
parameter altogether or reduce its importance. The most noteworthy case is Fisher’s 
transformation of the correlation coefficient (14.18, vol. I, p. 345). The transformed 
function 2 : — C is distributed nearly normally with variance l/{n — 3), so that the difference 
of two correlations when transformed does not involve the common value of C. 
(Cf. Example 14.8.) 

(e) We may find distributions which are independent of the unknown parameters, 
and even of the population, by using the methods of ranking or considering partitions 

(21.41, 21.48). 

(/) The fiducial argument, in at least one known case, gives a test independent of 
unknown parameters, namely the Behrens test (20.13). 

It must be realised, however, that all these types of inference do not stand on equal 
footings. In particular (e) requires further examination, as we x^roceed to show. 

21.53. We may now review the many different tests which have been described in 
this chapter and consider more closely the type of reasoning on which they are based. 
We may group our tests broadly into two classes, those which give a direct test of a given 
value of a parent parameter and those which do not. 

The first class rests on a type of inference which we have discussed fully in connection 
with the problem of estimation. There is, in fact, only a difference in viewpoint, and little 
or none in essential ideas, between estimating a parameter by assigning a range to accex)t- 
able values (whether by confidence intervals or fiducial intervals) and ascertaining whether 
some prior value lies in that range. The significance of parameters in large samples, the 
test of the mean in normal samples by ‘‘ Student’s ” distribution, the test of a correlation 
coefficient in normal samples, and others of the same kind relating to a specified parameter 
have the same logical foundation as the theory of confidence intervals or the theory of 
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fiducial intervals, whichever is preferred. Th&y all pTovid& for the consideration of alternative 
txilaes of the j^cbTarneter. 

21.54. The second group of tests are not, on the face of it, concerned with the value 
of a parameter in a parent population, and some of them take no account of possible alter- 
native! hypotheses. Consider, for example, a test of normality or a test of randomness. 
Idle hypothesis is that the population is normal or the sampling is random, as the case 
may be, but this does not specify a parameter. What alternatives to normality or to 
randomness are we considering, if any ? We must have the existence of such alternatives 
in mind, however vaguely, for otherwise we should not be testing these particular 
But can we say what they are ? And if not, do our inferences remain valid ? 
Wlien working with a probability a shall we still be right in a proportion a of the cases m 
t he long run ‘i 


21.55. The kind of argument we have used in all these cases is this : on the given 
hypothesis the observed sample and all samples providing a greater value of the statistic 
being used for the test have a small probability. Therefore we reject the hypothesis. 

Wo may note at once that in rejecting the hypothesis we do so in favour of another 
hypotliesis for which the observations are more probable. We may not express this thought 
explicitly, but it is there. The various statistics we use for testing normality, for instance 
6i, can arise with greater probability from other populations which are skew or have a 
marlvt'd deviation from mesokurtosis ; the fact is assumed as self-evident (as indeed it 
is) and hence, if the statistic is improbable for the normal case there will be non-normal 
(‘.ases of greater })robability. We remark, nevertheless, that the actual probability ot is 
(calculated 07 i the 'normal hypothesis and does not hold for the non-normal cases. Thus 
\v('! can no longer assert that we are right in proportion oc of the cases. We are therefore 
relying on a less delinite principle of inference to the effect that we reject a hypothesis 
\vhi(!h gives aai improbable value to observation, provided that there exists some other 
hypotlu'sis winch gives a more probable value. 


21.56. A similar argument applies to tests of randomness. It is obvious that many 
ot lua’ methods of generating a series exist which give a greater probability to a systematic 
scri<‘s t han the random method, and in rejecting the latter we do so more or less consciously 
in fa vour of the former. Our intuitive feelings on the point lead us to apply one test when 
w<‘ luivf! the possibility of systematic order in mind (the ranking test) and an()ther when 
we a.r<i intecrestod in oscillations (the phase test). What we are doing, in effect, is selectmg 
th<c test, of randomness which we feel to discriminate best between the hypothesis of 
i-andomness and the alternative possibilities. 


21.57. Although, therefore, much remains to be done in putting tests of normality, 
randomness and goodness of fit on a formal logical basis, there do not appear to be any 
.s<crious difficulties in doing so insofar as the specification of alternative hypotheses is con- 
(c(cnied. But there remains the difficulty hinted at at the beginnmg of 21.55. in e 
maioritv of cases we have a probabihty 1 - oc that the observed statistic U will be exceeded, 
and if ibis is small reject the hypothesis. But why exceeded ? Why reject the hj^othesis 
htHiause of tlie improbability of a number of events which have not happened . 

Here also it seems that a closer inquiry into the logic of the process wcmld 
while. Wo have seen how it can be justified by confidence-mterval or fiducial theory 
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when a parameter is under consideration. When no parameter is specified, the process 
must, in the present state of our knowledge, rest on more intuitive ideas. My own view 
is that, in a vague kind of way, we are really considering the range of values of a parameter 
without realising it. In selecting a statistic to carry out the test, we usually relate it to 
the sort of effect we are expecting to divert the real state of affairs from those of 
our hypothesis. For instance, if we suspect cyclical effects in a random series we base 
a test on oscillations in that series. The further the series deviates from randomness the 
greater will be the value of our statistic ; and consequently, if we could measure deviation 
from randomness (in the direction of cyclicality), we should have a parameter which could 
be located in a range in the manner of confidence intervals. Such a range would exclude 
the larger values of our statistic if it can be regarded in any sense as estimating the para- 
meter (or, more generally, as increasing with it) ; and hence the procedure of rejecting the 
hypothesis if the statistic is among these large values may be justified. 

21.58. It is for this reason that we began the chapter by defining tests of significance 
in relation to a parameter-value given a priori. It seems probable that in the ultimate 
analysis no other definition will be satisfactory. The fact that in this chapter we have 
given tests of hypotheses which do not appear to specify a parameter value is, I think, 
merely a reflection of the fact that the nature of those hypotheses and the inferences about 
them are not usually understood clearly but are based on more or less intuitive ideas. It 
is probable that many of these ideas are sound and can be given explicit logical foundation ; 
but the matter awaits investigation by the statistical logician. 

21.59. There remains for consideration the type of inference used in Pitman’s tests 
(21.48 and 21.49). These are of the character of tests of randomness. Given a set of 
values, we consider all the arrangements in which they could have happened and reject 
the hypothesis if the observed arrangement is improbable. Here again, as it seems to me, 
there is a suppressed series of alternative hypotheses which would make the observed 
value more probable ; and in choosing the test, such as the “ spread ” or the high value 
of a correlation, we are intuitively relating the magnitude of a statistic to the deviation 
from randomness. Pitman himself has shown, however, that when the hypothesis is 
definite and specifies the difference of two means, the tests give confidence intervals in the 
ordinary way (cf. Exercise 21.15.) 

We shall resume the general theory of tests of significance in Chapter 26. 

NOTES AND REFERENCES 

For the use of. the ^-distribution in non-normal cases see Geary (19366) and Bartlett 
(1935a), the latter of whom shows that, for moderate samples, departures from meso- 
kurtosis are not very serious. For approximations to t in the normal case see Hendricks 
(1936) and Hotelling and Franlcel (1938). For approximations to the ^-distribution see 
Cochran (1940a), Cornish and Fisher (1937), and Paulson (1942). See also references to 
Chapter 23. 

For the further theory of the ji^^-test see Neyman and Pearson (1928, 1931a) and for 
another test of goodness of fit Neyman (1937a). The theory of 21.44 has been studied 
by a number of writers, notably by Andre (1884), Kermack and McKendrick (1936, 1937), 
and WaUis and Moore (1941). 

The amalgamation of tests given in 21.51 was apparently first given by Fisher in an 
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early edition of Statistical Methods for Research Workers and was studied in detail by 
K. Pearson (19336) under the title of the P;^-test, and by E. S. Pearson (1938). 

For a test of significance of the dijfference of two variances in samples from a bivariate 
normal population see Hirschfeld (1937), Finney (1938), Pitman (1939c), Morgan (1939), 

and De Lury (1938) ; and see Exercise 21.3. mi • -i n • j-i. 

For the tests by Pitman, see his papers of 1937a, 1938. The simikr problem in the 
testing of homogeneity in the analysis of variance has also been studied— see references 

to Chapters 23 and 24. i • x r • 

For the test of difference of means when variances are unequal from the pomt o± view 

of confidence intervals see Welch (19386) and the appendix to this paper by Miss Tanburn. 


EXERCISES 

21.1. For the population represented approximately by 


dF 




1 _ (3a: — x^)\ e 2 dx, 


show that, if k| is negligible, the joint probability of a sample Xi 
if KTs is zero by a term 


1 ^ 

■"IT 0 


{2jtf 

By the transformation 




x.> 


20 : 3 ) 

. • -1- X,,) 


sin (/>i sin ^0 
sin </>! cos <j£>o 
cos <f>i 


x„ differs from that 


(Xj) — 3 (xj) > exp ( — -h 2^ Xj) dxi . . . dx^. 


and the further transformation 

vy, p sin 3 sin (/>„, , 

ya = p sin ;5 sin , 

yz = P sin 3 sin , 

Vn- l = P COS 

show that the corrective term to the distribution of “ Student s t is 


dt 




3 


PpZ -p 


3w 


V 


tp ) exp 




P 

2 


r-" 


p^dp 


and hence obtain equation ( 21 . 11 ). 




21 2 Bv the polar transformation of the type of the previous exercise applied to 
all n variates show that if a random sample is drawn from a normal population with zero 
mean the frequency element may be written as 

pW.— 1 g— 4 p“ y,p d(f>o sin d<^x sin^ (562 d<f>z • - • sin <56^,— 2 ^ 4 *n— 2 ’ 


n 
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Hence if w 


S I X 
ns 


■, -vvliere 5 ^ is the sample variance, the distribution of w is independent 


of that of s. Hence show that for the distribution of w, writing a 


■Ji 




r {n 


i“2 = 


/^3 = } j 

jUi = ^ -f {8a^ + 3) ■+■ 6a^ j 


n 2 


n 


Hence show that for n = 50, ^/^x = — 0-24 and — 3 TO, indicating fairly rapid tendency 
to normality. 

(Geary, 1935a). 


21.3. Show that in samples from a normal bivariate population 


dF oc exp 


1 fx^ 2pxy 
2 ( 1 — 0 ^ 1 0-2 Gz 


dx dy, 


the functions 


X.: 

Uj (- 


— , Uj 


R 


Vi 

are distributed independently and that their correlation coefficient R may be written 

a — a 

{ {a d- a)^ — 4aar^}’ 

„ ^ 

orf ^{y'-yV^ 

and r is the correlation between the observed x's and y's. Hence show that 


where 


^ _ i2-v/(^^ — 2) _ {a — a)'\/{n — 2) 

“ V(1 ~ a/{4(1 ~r^)aoc} 

is distributed as “ Student’s ” t with n — 2 degrees of freedom. Show how to test the 
ratio a from this result. 


(Pitman, 19390 . I'lin test has tlic' r(‘juarl<:abl(' ])ro])prty of Ixdng inclopciiident of the 
parent correlation p.) 


21.4. If an even number n of members of a sample come from a population with 
mean y, show how to find a sample of half the size distributed with twice the variance 
about zero mean. Hence show how to extend the result of Exercise 21.2 to the case where 
the population mean is not zero. 


21 .5 . If a parameter admits of a sufficient estimator, show that a test of its significance 
can be derived direct from the likelihood function. 


21.6. Derive equations (21.47) and (21.48), 
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21.7. Let Ixi, liz . . . be {n — 1) linear functions of the observations which 

are orthogonal to one another and to Xx, and let them have zero mean and variance uf. 
Similarly define Zgi . . . _ 2. 

Then, in two samples of n from normal populations with equal means and variances 
uf and cr|, the function 

Vn {xx — X 2 ) 
ihj + %)V(^ “ 1)}* 

will be distributed as “ Student’s ” t with — 1 degrees of freedom, 

(Bartlett, 1937c, and Welch, 19386. The test does not depend on the ratio al/al and 
can he extended to the case of xmequal sample numbers, but only at the expense of losing 
efficiency in the sense that the degrees of freedom number one less than the lower of the sample 
numbers.) 


21.8. Given two samples of rix, members from normal populations with unequal 
variances, show that by picking tix members at random from the (where > n-s) and 
pairing them at random with the members of the first sample, a test of significance of 
difference of means can be based on “ Student’s ” distribution independently of the vari- 
ance ratio in the populations. (This test, again, is exact, but sacrifices the information of 
^^2 — Ux members of the second sample.) 


21.9. If z is the ratio of the sample mean to sample standard deviation in normal 
samples, and n is large enough for the distribution of the variance to be regarded as normal, 
show that 




Cn V(2'W) 


V( 2 “ + 2 ) V{ + 2 ( 9 ^ - 1 ) } 

is distributed approximately normally with zero mean and unit variance, where 


c 


n 




(Hendricks, 1936.) 


21.10. li X, y have a continuous frequency function f{x,y), their characteristic 
function is 


^00 00 

^ (u, ^’) = 1 I exp (iux 4- ^vy)f{x, y) dx dy. 

J --00J —CO 

Show that the distribution of x when y is given has a characteristic function 


</. {u I y) 


J — 00 



<f> (0, v) dv 

J — ■ 00 


(Bartlett, 19386.) 


21.11. If a set of parameters 61 . , .6^, admit of a set of sufficient estimators, show 
that conditional inferences independent of (9i . . . dp are possible, the conditions being 
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22.3. We may also consider the more general curves typified hy 

f 2 /V y) dy 

J “00 


y = 


f f{X,y)dy ’ 

J — oo 


. (22.4) 


the regression now being of the rth moment of y on a;. If r = 1 we have the regression 
of the first moment, or simply the regression. If r = 2 and y is measured from the mean 
we have the so-called scedastic curve of y on x. 


Y == 


{ (y — yx)^f(X,y)dy 

J — oo ^ ^ 

'^ob " """ > 

f{X,y)dy 

J — OO 


. ( 22 . 6 ) 


which shows how the variance of y varies with x. Other forms which have been studied 
are the clitic curve 


and the kurtic curve 


f {y — yxYf {X,y)dy 

J — rj^ 

r f(X,y)dy 

J “00 

f {y -yxVfi^^yYdy 

r f{X,y)dy 

J - CO 


( 22 . 6 ) 


(22.7) 


These curves correspond to the moments of a univariate distribution, and tlie main 
characteristics of a bivariate form may be studied with their aid in much the same way 
as the lower moments can be used to summarise the properties of a univariate form. 


22.4. It is interesting to remark that, just as we can find the moments direct from 
the characteristic function, so also we may ascertain the regressions of motnents from 
the bivariate characteristic function, even when the distribution function itself is not 
explicitly given. 

Let us write the frequency function in the form 

/ y) = (} <jx (//), (-2.8) 

where g (ir) is the total frequency for any given x and g,. (y) is the frequency of y for any 
given X. In the notation of the theory of probability wo should write this 


/ (•'K. y) = <7 0^0 y (?/ I ^)- 

The characteristic function of x and y is then 


where 

and is the c.f. of y for a given x. 


<!> (ti, L) = exp .-r + it^ y] g (x) g, {y) dx dy 

J “ CO J — 00 

_ r gi/, .r g {ts^dx 

J “CO 

g,{y)dy 

J “00 


. (22.9) 

. ( 22 . 10 ) 
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If the rth moment of y about the origin for a given x is we have 




and hence, from (22.9), 




4^x (^ 2 ) 


«a = 0 


■ 

M 


4> ipii ^ 2 ) 


<2 = 0 




QityX g 


Thus, by the Inversion Theorem, 

\-ir 


9 (^) l^rx 


271 


r 


- itx X 




<f> (tl, iz) 


J /2 = 0 


dti. 


( 22 . 11 ) 


( 22 . 12 ) 


subject, of course, to conditions of existence. This gives us the required expression for 
in terms of x, and the regression can be written down at once. 


22.5. Since 


00 r 

<f> {ilj ^2) — ^ ^ j 


we have 




(itiY (itz^ I 
j\ k\ 


df 

^^2 J<,=0 


i exp 


xi- 


{it^y 


L j=0 


P- J 




J ?• = () 


(22.13) 


y = o 


and ^ (^ 1 , 0) may be written <f> (ti), being the characteristic function of g {x). We also 
have, subject to existence conditions, 

/ * \ t r><Xi 

. (22.14) 


n; \ {— iy 

D^g = g{x) 


dx> 


2jc 


pOO 

I t{ ^ (f) (ti) dti. . 

J -- 00 


Hence, from (22.12), (22.13) and (22.14) we find 


g {x) yi, 


— % 
271 


i 


UxX 


--- (f> [ti, t^) 
dt» 


(Iti 


J<S = () 


= L f’ 

271 J. 






y=o 


(22.15) 


Y „ y ^21 

IT j ! 


(22.16) 


x = X 


provided that the interchange of summation and integration in the last step is legitimate. 
Thus we have, for the regression of the mean, 

( - Dy g (x) 

Ui^) 

This notable result is due to Wicksell (19346). The expansion is valid if the cumulants 
exist and if g (x) and its derivatives are continuous in the range and zero at its extremes ; 
for then the interchange of summation and integration in arriving at (22.15) is legitimate. 
In particular, if g (x) is normal and in standard measure we have 

(22.17) 


Y 


S ’%] H, {X), 


where Hj {x) is the Tchebycheff-Hertnite polynomial of order j (6.20, voL I, p. 145). 
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Example 22.1 

For the bivariate normal distribution about the mean we have 


dF = k exp 


_ 2pxy 


dx dy, 


Hence 


and from (22.12) 


2{l-p^)\a\ a,a,ain 

<j> {ti, h) = exp {—Haiti + 2pai + o'! 4) }• 

~ = — pai Oa ti exp ( — HI ii)> 

i r °° 

g {^) i“i® = ^ J ^2 h exp { — HI t\ — iti X } dti 


Hence 


afVi^Tc) 


X e 20-;- 


paz 

P'lx ^ 

Gx 


the familiar relation of linearity for the regression of the mean of the normal distribution. 
Alternatively, direct from (22.17) we have, since = 0, j > 1 

- = K.. + H, (X) 

0*2 0 1 

Y = — X, as before. 

Cl 


Example 22.2 (Wicksell, 19346) 

Consider the frequency distribution of | = \X {x“) and r] = |A' (?/“) where x, y are 
samples of n from the bivariate normal population 


dF cc exp — { 


— 2pxy + y-} dx dy. 


The characteristic function is 


^ [J J ^ expdx -^^ O , + - |(1 - Ox)(l - 0 .) , 

where 0^ == iti and 02 = 'Ih- 

The distribution function cannot be expressed in a simple form, but we may determine 
the regressions without it. We have 

Laodo.-« / (i-9j‘'‘+-- 

Thus, from (22.12) 


rj: 


exp(|a;2 0, + y^-0z)dF 


2 / (1 — 


27C j_„ (l_e,)i»+r 
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The integrals may be evaluated by successive application of 

1 e-^^dd 1 

and we find, for the regression of rj on 



— i/^uV 


= {1 -P») +2p^f}. 

Thus the regressions of both mean and variance of ?? on | are linear. 

Fitting of Curvilinear Regression Lines 

22. b. From the practical point of view the case we have just considered, namely, 
the one where the distribution or characteristic function is given, is exceptional. The 
determination of regression curves has, in the majority of cases, to be carried out from 
numerically specified material, which we shall consider in the remainder of the chapter. 
We shall confine our attention to the regression of the mean. 

In general the means of arrays will not lie exactly on a smooth curve (unless of course 
we choose a curve of order equal to the number of points to be fitted, less one). Nor do 
we know a priori what is the appropriate degree of a polynomial which will approx- 
imately represent the regression line. Let us, however, assume that the regression can 
be represented by a polynomial of order p : 

Y + . . . (22.18) 

We will consider later how the appropriate value of p is to be determined in particular 
cases. Our problem is to determine the coefficients a from the data. As usual, we appeal 
to the principle of least squares, that is to say, we find the values of the a’s which will 
minimise 

U — E {y — — Oi X — . . . — . . . (22.19) 

the summation extending over the sample values. 

Differentiating with respect to we have 

E {x’ y) — a^E x’ ~ E — . . . — a^E — 0, 

and similar equations for f = . . . p. Writing the moments without primes for sim- 
plicity and letting /q represent the Jth moment of x, and /qi the bivariate moment 
E (x^ y). we have 

^*0 /Wo + /Ml 4- . • - -|- Clp P'p = pat 

Mo -f- Ml /^2 +■ ... -f- Mp /Mp-i-i = yix ^ 

Mo /q, + Mi/i^,^^ + . . • + ^i> — ypi. 

Writing now 

• • • iMp j 

. . . Pp+l 

. . . 

L 




//•O Pi 
1 

/^p 


A.S. — VOL. II. 


. ( 22 . 20 ) 


. ( 22 . 21 ) 
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and for the determinant obtained by substituting the product-moments //oi, . • - /i 
for the (j + l)th column, we have, as the solution of (22.20), 

J' . . - • • • 

A(P) 


a, = ^ 


22.1 . It might appear that this solution could break down if = 0. 8mdt a 
thing is not possible, however, except in the most trivial case. In fact, if the diHtribution 
function of the x’s is O {x), we have for A^^ 


or, if 



1 

Xo 


xl 

... 



Xi 

xl 


xl 

• • • 

dGi) (iUi * 



^p+1 

yyP-h 2 

p 

. . . 





1 

Xo 

... 



D = 


1 

Xl 

• • * ^1 


^“=11 



1 


. . . 


...1 

/y*! / 

•^0 *^1 ^ 

2 

^2 • * 

. xl D dOo 

dOi . . . dGi, 


dG. 


If we now permute the suffixes of the aj’s in all possible ways and sum the (/> j 1)! re.^ultaiitH 
we obtain, in virtue of the definition of a determinant, 

(p + 1) ! ziw = j j _ d(?„ da, . . . da,„ . . (i-’.a) 

and hence is essentially positive. 

22 . 8 . From (22.18) and (22.22) we see that the regression line may lx* writtcii 


Y 

1 

X 

. . . 

P’01 

Po 

Pi 

• • • Pp 

Pll 

Pi 

P2 

• • • 1 - 1-1 

Ppl 

Pp 

Pp+i 

• • • P2i> 


0 . . . (22.24) 


undi irofti ODHC’irvauon, 

and equation (22.24) then gives the regression line. 

It be observed that in order to preserve the synmudry w(' hav(‘ writ.tcji fdr 
the total frequency unity. 

4 .x. ^ somewhat different approach leads to the same solution. If w(^ imsunn* 

that the regression Ime is a parabolic curve of order we may find the (!OC'ffic.i(mf..s }>v (he 
prmciple of moments. This would lead us to identify the lower mf»m(mtH 

27 {x^ y) z=: 2J {at, cti X ‘ . . + x'^) 

as far as was necessary to determine the <r's. This clearly leads back to equation (22.2(1). 
Orthogonal Polynomials 

e have a set of data and no guide, apart from inspection, to the appropriate value of 
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the only course is to fit curves of order 1, 2, 3, . . . and so forth, until we reach the point 
when further terms do not improve the fit. Every time we add a new term the determin- 
antal arithmetic has to be done afresh. To obviate this nuisance we shall consider the 
regression line in the form 

r = 6o -Po + Pi + . . • + R,,, . . . . (22.25) 

where the P’s are polynomials in X, Pj being of degree j. We shall determine the P’s 
so that 

Y (P,. PJ = 0, . . . . (22.26) 

the summation extending over the observed values. 

In minimising 

P (2/ — Po — &i Pi - - . — bp Pp)\ 

we shall have equations such as 

X(yPj) -b,XiP,Pj) - . . . ~bpX{PpP^) =0, 

and in virtue of the orthogonal relations (22.26), this reduces to 

Y (yPj) - 6,. Y (PJ) = 0. .... (22.27) 

Thus bj is determined simply by Pj ; and if, having fitted a curve of order p, we wish to 
go a step farther and add a term b.^+i P^+i, the coefficients 6o • • • found from (22.27) 
remain unaltered. 


22.11. Purthermore, the use of these orthogonal polynomials will give us a very 
convenient method of determining step by step the goodness of fit of the regression line. 
We have 


U ^Xiy -boP, - . . . -bpPp)^ 

= 2 ; iy--) - 2b, Z (yP,) - ... - 26„ J? (i,P„) + hi S {PD + . . . + bf, S {Pi). 
But from (22.27) we may express X (yPj) in terms of X (Pj), and we thus find 


U==X m - hi X {PD blx {PI). . . . (22.28) 

Thus the effect of any term bj Pj is to reduce U by 6j X {Pf) and we may examine the effect 
of this term on U separately. If we find that the addition of any term 6^ P^ does not 
reduce U significantly, we may conclude that it is redundant (so far as concerns the 
representation of a regression line by a polynomial). 


22.12. We proceed then to derive expressions for the orthogonal polynomials in the 
general case. Later we shall examine the important special case when the values of x 
are equidistant (as, for in.stance, with grouped data and most time-series). 

Put 


Pp = ^ (22.29) 


In this expression there are (p -f- 1) unknown constants c, and hence in all the polynomials 
up to and including those of the jpth order there are i (p H- 1) (p 4- 2) constants. The 
orthogonal relations up to and including order p will then provide {p -4- 1) conditions 
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on the c’Sj so that p 1 constants are assignable at will. We will take one for each P and 
assign it so that the coefficient of in Pj is unity : 

% = 1. . ... . . . (22.30) 

111 particular Coo = Po = 1- The orthogonal relations are then just sufficient to determine 
the other c’s. For instance, for the set c^p j — 0 . . . ^ — 1, they are 

PP^Po =i;Pp =0 
S P.^P^ =0 

and so on. This system is clearly equivalent to the p equations 


rp. 


= o'i 


XxP„ =0 


Zx^-^P^ = 0 


(22.31) 


On substituting for the P’s from (22.29) we get 


/“o H“ C/ji //j + . . . +• Pp-i fip =0 

/^1 + ^pl /M'i + • • • + ^p,p-l f^p + + l ^ 


^/j() l^p — l "h ^pl P'p i“ • • • ~l“ ^p, p — \ — 2 “h P'ip—1 


0 . 


The solution may be expressed as a determinant in the usual way. Writing in accord- 

ance with (22.21) and for the minor of the term in the last row and {j -]- l)th column 
in (22.21), we find 


^pj 


^-^pj 


/((■p-i) 

This expresses the o’s in terms of the ascertainable constants fx. It follows that 


(22.32) 


P 


1 

t> ” '^{p-\) 


Pi) 

pi 

• • Pp 

Pi 

p> 

• ■ Pp+l 

Pp~ I 

Pp • 

• • p'ip-{ 

1 

X . 

. . X'’ 


. (22.33) 


We notice in particular that, in virtue of the diagonal .symmetry of we have 


^'jk — ^'ky 

22.13. In virtue of (22.31) we have 

P iP'f) = r (^^^ P,) 

and thus, from (22.33) on multiplying the last row and summing, 

V(p2) _ 

Similarly 


(22.3 4) 


jiP-i) 


y, , n . 7izl,)f^ 

(y K) = 


(22.35) 

(22.36) 
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FinaUy, from (22.27) 





. (22.37) 


Our problem is now solved. We have expressed all the unknowns in terms of 
calculable determinants. 

We may note in passing that since the regression equation must remain co variant 
under a change of origin, all the coefficients h except bo are seminvariant, and the origin 
can thus be chosen at will, bo itself is the mean of the y-values. 

22.14. Exphcitly for the polynomials we have (taking fii = 0, /^a = 1) 

R = 1 (22.38) 


P^ = 


1 0 
1 


= X 


. (22.39) 



10 1 

0 1 fAo 

1 X 

T^i’^d ^ 

0 1 


= - fioX 



. (22.40) 


P. = 


10 1 /is 

0 1 /h /W 4 

1 /f^3 Pi /Ws 

1 X x^ x^ 
1 0 1 

0 1 //.3 

1 /^3 


= ‘>1 1) X® (/is jWs) X 

i«4 — i“3 — ^ 

“1“ (/^3 /^5 — /^4 “k /^s) Y -k ~k A*;})} • ■ (22.41) 

and so on. In particular, if the population is normal, 

Pi = X 

Pa = X 2 - 1 

P 3 = X3 - 3X, etc., 

the polynomials in this case reducing to the Tohebycheff-Hermite functions (6.20) which 
we know to form an orthogonal system in the normal case. 


Example 22.3. Ungrouped Data 

Table 22.1 shows the relationship between the percentage loss in weight (7) and the 
temperature (X) in a number of samples of soil. We require to find the regression of Y on X. 
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TABLE 22.1 


Fitting of Curvilinear Regression for Ungrouped Data 
(Data from J. R. H. Coutts, J. Agr. Sci., 20, 541.) 


Percentage Loss 

Temperature 

in Weight. i 

(degrees P.). 

Y i 

X 

3-71 

100 

3-81 

105 

3-86 

110 

3-93 

115 

3-96 

121 

4-20 

132 

4-34 

! 144 

4-51 

153 

4-73 

163 

5-35 

1 179 

6-74 

191 

6-14 

203 

6-51 

212 

6-98 

226 

7-44 

i 237 

7-76 

1 251 

; 



For the sums required we find — 

n = IQ, 2 {y) = 82-97, 2 (y^) = 459-4363 ; 

2 (x) = 2642, (x^) = 474,050, A* (x^) = 91,244,582 ; 

2 {x^) = 18,553,164,842, 2 (x^) = 3,930,294,225,302; 
2 {x^) = 858,077,668,755,250 ; 2 (yx) = 14,736-19 ; 
2{yx^) = 2,819,909-45, 2 {tjx^) = 571,902,362-11. 


These can be run off fairly quickly on a machine. We have not bothered to take a different 
mean from those given, but in general a certain amount of arithmetic can be saved by 
so doing. 

Considering first of all the straightforward approach of (22.24), we have for the straight 
line of closest fit. 


reducing to 


Y 1 X 

82-97 16 2642 

14,736-19 2642 474,050 


= 0 , 


Y = 0-660 -f 2-741 ( — Y 

\iooy 


. (22.42) 


We have put n/Hj instead of in the second and third rows of the determinant, as we are 
clearly entitled to do. 

Similarly we find for the second- and third-order parabolas — 

r = 3-561 - 0-920 ( + l-«70 ( (22.43) 

r = 7-783 - 8-940 ( A ) - 5-875 ( ^ V - 0-9189 ( X Y - 

V 100 / V 100 ) \ 100/ 


. (22.44) 
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Pig. 22.1 shows the straight line and cubic fitted to the data by these means. An examina- 
tion of the coefficients in the equations illustrates the point made above, that as successive 
terms are added to the poljmomials the coefficients of all terms may alter very considerably. 



Fig. 22.1. — Straight Line and Cubic Parabola of Closest Fit to the Data of Table 22.1. 


Consider now the alternative approach by the use of orthogonal polynomials, 
the use of equations (22.33) we have 

16 2642 

1 X 

X - 166-125. 


By 


Pi 


/ 


16 


16 

2642 

1 


2642 

474,050 

X 


474,050 

91,244,582 


343-137X 4- 27,032-435. 


16 

2642 


2642 

474,050 


16 

2642 

474,050 

1 


2642 

474,050 

91,244,582 

X 


474,050 

91,244,582 

18,553,164,842 

X2 


91,244,582 

18,553,164,842 

3,930,294,225,302 

X3 


16 2642 474,050 

2642 474,050 91,244,582 

474,050 91,244,582 18,553,164,842 

= X3 - 522-940X2 + 87,182-434X - 4,605,047. 

The 6-coefficients are given by (22.37), the determinants in the numerator having been 
already tabulated in finding the P’s. We have 


bo = 5-1856, 6i = 


2-7409 
“lOO ’ 


6, == 


1-0695 
1002 ^ 


bo = 


0-91889 
1002 ’ 
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these being the values already found in arriving at (22.42) to (22.44). Thus 
r = 6-1856 + (X - 166-125) -f (X‘ - 343-137X + 27,032-4) 

_ (X3 - 522-940X2 87,182-4X - 4,605,047). . . (22.45) 

100 ® 

Tf we stop at the second term we have 

2-7409 

Y = 5-1856 + - 165-125) 

= 0-660 + 2-741 (j|), 

which is the same as (22.42), as of course it must be. Similarly, if we stop at the third or 
fourth terms we find equations (22.43) or (22.44). 

Now consider the fit of the regression line. We have from (22.35), 

5* X (Pi) = n bi r ( YP,). 

The determinants in this expression have already been evaluated in finding the regression 
line. Remembering that X (y^) = 459-436 we obtain the following : — 


1 


AU) 



bj. 

1 

"•V J(7^- 

U (equation ( 22 . 28 ) ). 

0 

5-1856 

430-247 

29-189 

1 

2-7409 X 10-2 

28-390 

0-799 

2 

1-0695 X 10 -“ 

0-()69 

0-130 

3 

- 0-91889 X 10 -® 

0-080 

0-050 


In calculations of this kind it is as well to take to an extra place of decimals, as the value 
of U is rather sensitive to small errors of rounding up. Even so, the last figure in U is 
unreliable. 

From the values of U it is clear that the fit is greatly improved by taking a quadratic 
term, and still further improved by adding the cubic term. How far a quartic term would 
improve matters cannot be decided without ascertaining the term. We have, however, 
not proceeded beyond the third degree because to do so would require moments of the 
eighth order. For a small population such as this, which in practical applications would 
be considered as a sample only, the errors in higher moments would probably be considerable. 

The reader who works through the arithmetic of this example wiU find that there is 
about the same labour involved in either method. It is in the fitting of higher order terms 
that the method of orthogonal polynomials shows its superiority. In practical cases it 
is preferable to avoid the large numbers arising from the evaluation of determinants by 
a modification of the procedure given in 22.27 below. 

Example 22.4. Grouped Data 

In Example 14.1 (vol. I, p. 331) we considered the correlation between age and highest 
audible pitch in 3379 subjects and found the linear regressions. Let us take the work 
a stage further. 
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For the data of the table (X = age, Y = pitch) we find — 

X {y) = _ 708 ; X (2/2) = 8894 ; X {yx) = - 12,535 ; 

X [x) = 2604 ; X [x^) = 47,392 ; X (x^) == 387,498 ; 

X (a;^) = 4,842,172 ; X (x®) = 62,401,794 ; X (a:®) = 883,576,012. 

As a variation on the procedure of the previous example, we will convert these figures 
to moments about the mean (with Sheppard’s corrections) and put them in standard measure. 
Wo find — 

^0, = — 0-209,529 ; ^02 = 2-504,904 ; 

= 0-770,642 ; = 13-348,229. 

In standard measure the other moments are 

= 1-705,375 ; = 6-295,759 ; 

/is = 20-729,861 ; = 78-409,775. 

We may now use equations (22.38), etc., direct, and find 

= 1, pj = X, Pa = X2 - 1-705X - 1, Pg = X3 - 3-471X2 - 0-376X + 3-560. 

We now require the moments /igi and /isi. We find 

X [yx'^) = — 112,495 
X {yx^) = - 1,399,639, 

and hence, with Sheppard’s corrections and in standard measure, 

= — 1-177,920 //,3i = — 4-215,958. 

Wc now find, from (22.37), 

K = 0 

- 0-613,626 
— — 0-055,064 
63 = 0-010,205. 

'Phc' r(^gression line of the third degree is then 

Y -- - 0-6136X - 0-0551 (X2 - 1-705X - 1) + 0-0102 (X^ - 3-471X2 - 0-376X + 3-560), 
where the origin is at the mean and the units are in standard measure. 


Standard Errors of Eegression Coefficients 

22.15. The standard errors of unknowns derived from least squares can be found 
by the use of a result due originally to Gauss. Suppose ccj is the true value of and the 
residuals y Xotp’^’ are distributed normally with variance v. Writing da^ = ccj — 
wa have for the frequency function of the residuals — 
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0 denoting summation over the sample and Z over the values ao to and the cross- 
term vanishing because the a’s are minimal values) ; 


az constant X exp 


-^EE{da.xiy 
2v s j ^ ' ’ 


oc exp — {daj daj^ 

s jik 
72/ 

CC exp -—E^(da.jda},iiij+k) (22.46) 

In the limit, then, the deviations are distributed in the bivariate normal form, and from 
the results of 15.12 (vol. I, p. 376) it follows that 

(22.47) 

for the determinant whose terms are is in fact the determinant we have already defined 
as d ** , and A is the minor of the item in the ^'th row and column. 

Now V is the variance of deviations from the theoretical regression line, and in terms 
of variations about the observed line we have, remembering the result of 18.17 


. (22.47) 


var at 


var e 


^{p) n — p — 1 

Since the correlation ratio of y on x is given by 

var e — var y (1 — 77 ^), 


. (22.48) 


we have also 


var a,,- 


A^jj^ (1 —7}^) var y 
^(V) 71 — p — 1 


(22.49) 


For large samples the replacement of w by w — jp — - 1 in the denominator is an unnecessary 
refinement. 


22.16. ^ For the case of orthogonal polynomials the results apply with a slight but 
important simplification. The coefficient bj is the same as aj if polynomials up to order j 
only are fitted, and hence, since A^^^ — we have 

n — ^97^) va,T n 

^ ^(7) n - y (22->0) 

The same result follows by modifying (22.46), which for orthogonal polynomials becomes 


/ X exp _ ^^2' f2P| (d 6 ,)«|, 


. (22.51) 


showing that the 6 ’s are independently and normally distributed with variance 


var 67 


reducing to (22.50) in virtue of (22.36). 


EPy 


22.17. If the parent population is normal, 77 = p, and the determinants A^^^ can be 
evaluated explicitly in terms of the variance of a;. In fact, 

1 

_7!(vara;)^’ 


. (22.52) 
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and hence 


or, in standard measure, 


var hj — 


I 

n-o-l 


(1 — p“) var y 
j ! (var xY 


var hj = 




(22.63) 

(22.54) 


Equation (22.52) can be found by evaluating the determinants in the ordinary way, but 

/[(J) _ 1 9 . . 

it follows more simply from the consideration that is equal to - Z , which, in the 


normal case, is for large samples equal to JS (P|) —j ! (var x)^ (6.22. vol. I, p. 147, with 
a change of scale). 


22.18. The advantages of using orthogonal polynomials instead of powers of X 
are apparent in the forms taken by the standard errors of the coefficients a and 6 . The 
latter are independent of the order of the pol 5 naomial fitted and can be tested once and for 
all. The former do not possess this advantage. It seems preferable, therefore, as a matter 
of technique, to work with orthogonal polynomials throughout, whenever regressions of 
order higher than the first are likely to require investigation. 


Example 22.5 

Consider again the data of Example 22.4 (regression of highest audible pitch on age). 
We have there expressed the regression line in standard measure and in the orthogonal 
form, and may therefore use equation (22.50) in the form 


var hx = 


var 60 = 


var 


n 


^( 0 ) 


r] 


^( 1 ) 
n 


n 


(The sample number n is so large that we can ignore the element — (^‘ + 1 ) in the divisor.) 
The determinants required are already known, having been ascertained in the course of 
the work. We have 


J(0) 

ZO) 


= 1 , 


J(l) 

2 ^ 


0-4189, 


J(3) 


0-0985. 


We also require 77 , which was found in Example 14.11 (vol. I, p. 352) to be 77 ^^ == 0-6231. 

Thus 1 - 772 = 0-6117. We find 

, 1-8104 , 0-7584 , 0-1783 

var hx == , var , var 63 = ", • 


The values of the 6 ’s and their standard errors are then 


Order. 

6. 

Standard Error. 

1 

- 0-6136 

0-013 

2 

- 0-0651 

0-0087 

3 

0-0102 

0-0042 
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In all cases we should judge the coefficients significant, as being more than twice the standard 
error. Although, therefore, the second- and third-order terms are small and the regression 
is approximately linear, the deviation from linearity is not merely a chance effect. 

Exact Significance Tests in the Normal Case 

22 . 19 . When the parent population is normal, more exact tests than those; (ksfived 
from the use of standard errors may be obtained. We have already seen ( 14 . 21 , vol. I, 
p. 348) that a function dependent only on sample values and the first regression coefii(!ient 
1 was distributed in “ Student’s ” form. We proceed to generalise this result. 

Consider in the first place the linear regression equation 

Y = y + \{X ~ x), (22.55) 

and let be the population value of and or| the variance of y in the population. Since 
tHe parent is normal, the variance of y for any fixed value of a: is at 

Our estimate of is 


Cl — , 


E {x — x)^ 

where summation takes place over the sample values. Thus for fixed values of x we have 

E (x — x)^ vary 


(22.56) 


var = 


{E (x — :r) 2 ) 2 




E {x — x)^' • • . . . (22.57) 

thL'^T is A. we see that, for san.pl.-s I.avhia the 

by (22 57)-r,on^aUv distributed about mean /?. with variaiiee ^iven 

(foiiequlntlyr ^ ^ themselves normal. 

(bi — \/ E (x — xf 

— „ * * . . . (2l2,5H) 

is dirtributed normally about zero mean with unit variance. 

but in f^^ris nrtLlTOlnd thfsuMt!‘r^"* °f *>. « ordinary ,va.v ; 

form brings in the *-distribution in the us^fTOv “ i-''® 'i'.vpe ill 

function s, where estimator of tln^ 

1 


52 = 


n ~ 2 


^{y - Y')\ 


(22.5!)) 


amd r represents the values “ pre&t^ ■> by the regression line, that i.s, the values 

distributed in the Ty^emXrm ST f 7* P‘'<'*''tl,v that .v= is 

It follows that m with n - 2 degrees of freedom independently of b, - /!,. 

t = (iij r {x - x )^ ,/ fa, _ 2) 

'\/Jj (y ^ 

is distributed as “ Student’s ” i with r = n _ 2 

A given value /5. may be tested acoordinelv' Tiur 

conditio^ one, that is to say, we are oonsiderhiv the T 7® tho inference is a 

for which the .’s are the same as 
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22.20. To establish the foregoing result we have to show that E {y — Y')^, the sum 
oi squares of residuals about the observed regression line, is distributed in the Type III 
onn with u 2 degrees of freedom. This is a particular case of a general theorem we 
ahall prove at the beginning of the next chapter, but we wiQ sketch an ad hoc proof here 
for the sake of completeness. 

Since the population is normal, the deviations of y from the true regression line for 

fixed s, Y = ^0 (X — x), where is the parent mean of y, is normal with variance 
<75. Now 

(n — 2) ~ = Ai7(y — F')® = ^ ^ {y — bo — {x — x) 

= - ^0 - ~ x) - {bo - ^0) - (61 - ^i){x -x)}K 

0^2 

The coefficients bo and bi were chosen so as to minimise this sum, and hence 

{n- 2 )^^.= Lz{y - ^0- ^i{^ -x)}^ - 2 L{h^ - 

0^2 cr^ O2 

*^rhe first term is the sum of squares of n normal variates with zero mean and unit variance ; 
tile second is also such a variate, for it is the square of the deviation of the mean of y about 
its true value divided by the variance <r|/w ; and the third term is also such a variate, as 
shown above. 

It does not follow immediately that is distributed as the sum of squares of 

71 — 2 normal variates in standard measure, for the constituent items might be correlated. 
Ijot us tlien find an orthogonal transformation to new variates Si ... linearly related 
t-o the 71 , normal variates y — ^0 — {x — x). These also will be normally and inde- 
peiKhuitly distributed. In particular (remembering that our summations refer to the 
v/’s a.n<l .r’s, but the latter are constant for our distributions), take 

1 „ , 


(TaV^U 

Vn n 


^ {y — ^0 — {x — x)} 


{K - / 5 o) 




(bi — ^i) 's/e {x — x)^. 


Si and ‘S-j. then normal variates in standard measure. Moreover they are orthogonal since 


ESiS. 


__ 1 „ X — X 

di '/n ^/E {x — xy 

= k E {x — x) 


n 

( biiHcqncntly our transformation exhibits the first term on the right in (22.61) as ^ || and 

j=i 

n 

the second and third as Sj and ^|. Thus the total is distributed as ^ |?, which is the 


result required. 
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We may compare tlie result of 18.17 — ^ixL which, we saw that the mean value of 
was n, whereas that of was n — p — 1, one degree of freedom having been lost in the 
sum of squares of residuals for every constant estimated — and the approximate result of 
21.20 in which had to lose a degree for each constant fitted by maximum likehhood. 
Fundamentally all these results are different aspects of the same thing and rest on the fact 
that the variation of the sum of squares of normal variates in standard measure is spherically 
symmetric, so that a hyperplane in the sample space “ cuts ” the distribution in a spheri- 
cally symmetric form of one lower degree of freedom. 


Extension to Curvilinear Begression 

22.21. The foregoing result can be extended without difficulty to the case when 
the regression is curvilinear. If 


F — 6o Po -f- Pi -f- . . . 4- Pj,, 
where the P’s are orthogonal, then 


'Jj E,.> » 


rp| 


and we have also, for the variance of 6,- when the rr’s are fixed, 


so that 


var 6,- = 

^ 27 Pf 


% - VPP| 


0'2 

is distributed normally with zero mean and unit variance. Taking as our estimate of ot 

s2 = \ Z{y - Y')\ 

we see, as before, that 


t = ft) V {n -J - 1) VPP| 

VP (2/ - rr 


. (22.62) 


is distributed as “ Student’s ” t with v = n — j — 1 degrees of freedom. 

It will be observed that in this and the previous section we have not assumed anything 
about the distribution in ir-arrays. We have merely supposed that for any given x, y is 
normally distributed with constant variance. 


Example 22.6 

Consider again the soil data of Example 22.3. We found, for the cubic term in the 
parabola, a coefficient of — 0'9189 x 10~®. Is this significant ? 

Here 6,- - - 0-9189 x 10"® for i = 3 ; 

V{n - 1) = V(lfi — 4) = 3-464. 

We have already found Z {y — Y')^ = U, namely 

U == 0-050. 

W^e further require Z Pj which has been obtained incidentally in the working of Examnle 
22.3 and is equal to 9-31525 x lO^o. Hence 

_ f _ 0-9189 x 10-6 (3-464) 3-052 x 10® 

0^2236 

= 4-3. 

This is highly significant. 
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Case when the Independent Variate proceeds by Equal Steps 

22.22. An important special case arises when the independent variate has values 
which are equidistant, as, for instance, in most time-series and in grouped data. If we take 
the interval between successive values of x as our unit, the variate-values may, by a suit- 
able choice of origin, be taken as 0, 1, 2, ... w - 1. The various moment-functions 
entering into the expressions for polynomials, etc., may be written down once for 
all. Furthermore, this case lends itself to simpler summatory methods of forming the 
actual polynomial values and the residuals. 


22.23. For a set of values 0, 1, 2, ... n — 1, we have 


Eix) 


n{n — 1) 


Z{x^) 


n {n — 1) {2n — 1) 


etc. 


Thus- 


^2 

yi — i {u — 1 ), a*2 = — j2 — ’ ~ 


From (22.38) and similar equations we then find 

„ X V 2 — ^1^3 — yi _ p2 — 'i- 

^ //a ^ 

and so on. The polynomials may be obtained more systematically as follows : 
We show first of all that 

^ - 1 \ A3 


(22.63) 


3-0 


I -^3 


-.Pp = 0, 


q — 1, 2, . . . p, 


. (22.64) 


where A^ is the Jth terminal difference of and the a;’s range from 0 to 71 — 1. In fact, 
from Newton’s interpolation formula, 

Pp ~ ~j~T * 




q <p. 


and since the P’s are orthogonal, 

Z{xA-q- - 0, 

Substituting from (22.65), we find for the term in A^ — 

Aj ^ 3 

j:{x+,i- = X { (X + - (X + ? - (jq.— )jj Pp 

= (« + <?- 1)''’+^' 


. ( 22 . 66 ) 
. ( 22 . 66 ) 


3 


Ai 


p 

X iV). 


(i+l'ji! 


Thus for all q from I to p we have 

p 


A3 


0 = X <” + 3 “ y + j) 7r ” 


3- = 0 


= (^J: q ^ ~ ^ Vr^Pp, g <P 

{n-l)\ \ 3 J3+^ 


whence follows (22.64). We now find functions obeying these conditions. 
Consider 


y = C {x pY’^^ 


. (22.67) 
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This is a polynomial of degree p, and if for a; = 0, 1, . . 
we have — 

-j, x"' Vji— 


p it assumes the values yo> ■ • • Vp 


. ( 22 . 68 ) 


for this also is of degree p and has the right values at a: = 0, . . . p. Taking now 


we find that for x = 


Vi 

a 


111 (— 1)P-J /[3 P 

{n -j - 1),! ^ ^ 


(- Vo 


(22.69) 


y{-i)=C(p + (- 1 ) 1 > y 4 - 


V 


= C (- 1)- ip + 3)(P+1I X(” 7 ^ ) 




S' + j 


-.P.. 


. (22.70) 


Now from the definition of y this clearly vanishes for — a; = ^ = 1, . . . p, and thus 
(22.70) is zero. Comparing it with (22.64) we see that the conditions are satisfied if we 
give to the value of A? of (22.69), i.e. 

~i - 1) ! 


A3 p — 

(^ -1)! (p-i)! 


(- 


c 


in -j - 1) ! (p yj) ! 


(- 1 ) 


P~3 


(22.71) 


(^ - 1) 5 ip 

The constant G is evaluated by the fact that the coefficient of in P, is unity, giving 
J*' Pp = p ! This gives 


C 


{p !)^ (w — 1) ! 


(2p) \ in —p — 1) r 
Mnally, substituting in (22.65), we find 


(22.72) 


Z (- 1)”' 


{p \y ip + j) ! in - j -1) ! 


j=-0 


XiX -1) ... iX - j + 1), (22.73) 


(2p) ! ( j !) ~ j) '• {n —p— l) I 

where by convention the term is unity for j = 0. The first six polynomials are 

n — I 


Pi =X 

P.=Pl- 
P3 = Pf - 
P.=Pt- 


2 

- 1 
12 “ 
3w2 — 7 
2^ 
3n^ ~ 13 
14 




p 2 , 3 jn- 1) in'^ — 9) 
" 560 


P. = ps _ ^ - '^) ps , - 230^2 + 407 „ 


18 


1008 


P« = P® — ^ ^1) pi j_ — llOai^ + 329 

1 44 1 ^ 

_ 5 (to2 __ 1) (^2 _ 9) (^2 _ 25) 

14784 


(22.74) 
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Four more values are given by Allan (1930), to whom the above derivation of (22.73) 
is due. 

Values of the polynomials up to and including the fifth are given in Fisher and Yates’ 
.Statistical Tables up to = 52. 


22.24. We can now find an explicit expression for E Pi- Since the polynomials 
are orthogonal we have 

ZPl^E(x+ P„ 

which, by the argument resulting in (22.64), leads to 

V p2 _ V (^ + ^ p 

. p+j^l 

Putting q = p 1 in (22.67) and (22.70), we have 


2,(_^) = (7(-1)[^] = {-1)^ {2p + 


X)[2> + 1] 


2 ( 


n — 1 

j 




P +j + ^ 



whence, after a little rearrangement, 


V jn^p ) 1 AiP^ _ (p!)M^+P)l 

j \ {n — j — 1) ! ^ + i 4- 1 {^p 4- 1) ! ~ ’ 

and thus, substituting for C from (22.72), we find 

' = (2^) /f2p V 1) 1 “ - 2^^)- • • 


22.25. 

differences. 


It is also possible to ex|)ress the orthogonal polynomials in terms of central 
We quote without jjroof the results (for details of which see Allan, 1930) : — 


where 


p ! 

ip~i) 


[in]" P, Z 


A-iyiP ~:i ~i)'- [-PJ 




(p - 2;) ! j ! 2« 


[a;]''’ 


X 


{ n 


1 )}! 


{x -iin - 1)}! 


. (22.76) 
.' (22.77) 


The series is summed from j — 0 until 2j > p, when the denominator vanishes and (p — |) 1 
is written for P {p + -|) to preserve the factorial notation. In practice the polynomials 
for particular examples are not determined from (22.73) or (22.76) but by the use of tables, 
or by summation from differences in the manner of Example 22.9 below. 


Example 22.7 

For the fitting of a regression line in the case of equidistant intervals various methods 
are in use. A choice between them depends on the length of the series, the order of regres- 
sion to which it is desired to go, and the computing resources at the investigator’s disposal. 
We will illustrate two methods in this and the next example. 


A.S. — VOL. II. 


M 
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TABLE 22.2 

Fitting of Regression Line by Orthogonal Polynomials — Equidistant x-intervals. 


(1) 

Year. 

(2) 

Variate. 

Pi 

(3) 

Population 

(million). 

Y 

(4) 

P 2 

(5) 

IPz 


1811 . . 

_ 6 

10-16 

22 

- 11 

99 

1821 . . 

- 5 

12-00 

11 

0 

- 66 

1831 . . 

- 4 

13-90 

2 

6 

- 96 

1841 . . 

- 3 

15-91 

- 5 

8 

- 54 

1851 . . 

- 2 

17-93 

- 10 

7 

11 

1861 . . 

- 1 

i 20-07 

- 13 

4 

64 

1871 . . 

0 

22-71 

- 14 

0 

84 

1881 . . 

1 

25-97 

- 13 

- 4 

64 

1891 . . 

2 

29-00 

- 10 

- 7 

11 

1901 •. . 

3 

32-53 

- 5 

- 8 

- 54 

1911 . . 

4 

36-07 

2 

- 6 

- 96 

1921 . . 

5 

37-89 

11 

0 

- 66 

1931 . . 

6 

39-95 

22 

11 

99 


In Table 22.2, column 3 shows the population of England and Wales (in millions) 
for the years shown in column 1. These are at ten-yearly intervals, and the variate-values 
in units of 10 with origin at the mid-point of the range are given in column (2). These 
are the values of Pi. 

The corresponding values of P^, Pa and P4 are given in the last three columns. They 
may he calculated direct from (22.74), but are most conveniently taken direct from the 
Fisher-Yates tables. 

We find, for n = IZ, 


Z YP^ = 474-77 
Z YP^ = 123-19 

Z YPa = - 39-38 X. 6 = - 236-28 
Z YP^ = - 374-30 X ^ - 641-657,143, 

and, direct from the tables, 

ZPl = 182, ZPl = 2002, ZPl = 572 x 36, 

ZPl = 68,068 X (V-)^* 

Z YP- 

Hence, from equations of the type we find 

2 / JTj 

61 = 2-608,626, 62 = 0-061,533,467, 63 = - 0-011,474,359, 64 = - 0-003,207,699 

and the quartic curve is 

Y - 24-1608 = 2-6086Z + 0-061,53 (Z® - 14) - 0-011,47 {X^ - 25Z) 

247 


0-003,208 


(- 


X^ + 144 


) 


We can now find the residuals for each term in this equation. We find 

ZY^ = 8839-9389 
ZY =314-09. 


. (22.78) 
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Hence the sum of squares of T about the mean of Y, 

i:(Y — ty = 1251-283. 

Thus we have : — 



Residual Sum of Squares. 

Original variation 

Contribution of first term == Z {YP{). 
Contribution of second term FPg) . 

Contribution of third term = 63 i 7 (Fp 3 )" . 
Contribution of fourth term = Z { FP 4 ) . 

1251-283 

1238-497 

7-580 

2-711 

2-058 

12-786 ^ 

5-206 

2-495 

0-437 


For the variance of the residual elements we divide by the number of degrees of freedom 
{n —j — 1) and obtain 


Residual Sum of Squares. 

Divisor. 

Residual Variance. 

12-786 

11 

1-162 

5-206 

10 

0-521 

2-495 

9 

0-277 

0-437 

8 

i 

0-055 


Fig. 22.2 shows the data graphically with the cubic and quartic of closest fit. 



Years 

Fi«. 22.2. — Cubic (full line) and Quartic (broken line) Parabolas fitted to the Data of Table 22.2. 

The fit is evidently a good one, as is borne out by the smallness of the residual variance, 
but we must sound a warning as to the use of this polynomial. For interpolation in the 
variate range it would probably suit very well ; but for extrapolation outside the range 
it is dangerous unless there is good reason to suppose that the polynomial has some theoretical 
basis (which is not so). It would, for instance, be most unsafe to try and estimate the 
population in 1960 by inserting A = 9 in equation (22.78), 
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Example 22.8 

In Chapter 3 it was seen that factorial moments can be derived by summatory pro- 
cesses. A somewhat similar method can he used to fit orthogonal polynomials. We will 
illustrate it on the data of the previous example. 

TABLE 22.3 

Fitting of Orthogonal Polynomials by Factorial Sums. 


5-0 

5-1 


^3 1 

1 

10-16 

10-16 

10-16 

10-16 ' 

12-00 

22-16 

32-32 

42-48 : 

13-90 

36-06 

68-38 

110-86 i 

15-91 

51-97 

120-35 

231-21 

17-93 

69-90 

190-25 

421-46 ; 

20-07 

89-97 

280-22 

701-68 ! 

22-71 

112-68 

392-90 

1094-58 

25-97 

138-65 

531-55 

1626-13 

29-00 

167-65 

699-20 

2325-33 ‘ 

32-53 

200-18 

899-38 

3224-71 

36-07 

236-25 

1135-63 

4360-34 : 

37-89 

274-14 

1409-77 

5770-11 

39-95 

314-09 

1723-86 

7493-97 

314-09 

1723-86 

7493-97 



In Table 22.3 the column headed Sq gives the value of Y. The next column, headed 
Si, gives the sums of the values in the first column proceeding from the top ; and so for 
the columns headed S^ and S^- 
Now construct the quantities 



n 


314-09 
“ 13 


24-160,769 


a. = = 18.943.516 

n{n 1) 182 




3 ! 


, , = = 16.470,264 

n in 1) {n + 2) 2730 


the general formula being 


Then obtain the quantities 


% = 


{j + 1) ! S^ 


n{n 1) . . . {n + j) 


= Uo = 24-160,769 
a[ = Oq ~ ax = 5-217,253 

di) dn 


the general formula being 


Sttx + 2a^ = 0-270,749, 


(Xp — (Xq 


P (p + 1} ^ _i_(p — 1) (p) {p + 1) (p + 2) ^ 
(1 !)^ 2 + ( 21 ^ 


(22.79) 


. (22.80) 
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Finally put 


6 o = 0^0 = 24-160,769 


b^ = 


6 


n 


6 , = 


{n 

the general formula being 


= 


30 

1)7% 


6 (5-217,253) 


2 ) 


12 


ao 


2-608,626 


30 (0-270,749) 
132~ 


0-061,534, 


^ (jo !)® {n 


ap 


(22.81) 


1) ... {n -p) 

Then the 6 ’s are the coefficients of the orthogonal polynomials in the regression equation. 
The values we have found check with those of the previous example and the reader may 
care to work out 63 and 64 by the same method. 

This process is due to R. A. Fisher and avoids the direct calculation of the values of 
the orthogonal polynomials. Its validity may be established by using equations (22.75) 
and (22.73), which give 

^ y Pp _ _ ( 2 jp !) ( 2 p + 1 ) ! 

Ip !)* n {n^ — i) . . 




i:pI 


(72,2 _ ^2^ 


^ {y Pp) 


{2p + l) 


(-l)p-3 ! (n-j — l) ! (i+1) Pyx . . . {x-j+1) 


(p !)2 ( 7 ^- 1 ) . . . (n-p) j (j !)2 {p~j) ! (j + 1 ) {n~p-l) \ n . . . (n+p) 

The first part of the expression explains the coefficients in (22.81), the second part those 
in (22.80). The third part gives rise to (22.79) when it is remembered that the sums S 
are expressible as sums of factorials (cf. 3.10, vol. I, p. 58), but the summation takes place 
from the top of the column. 

Example 22.9 

As a rule it is unnecessary to evaluate the polynomial at all the points for which data 
are given ; but if the values are desired for comparison with observation they may be 
obtained by summatory processes from the differences. 

The terminal differences themselves are obtainable simply from the quantities ttp of 
the previous example. For a polynomial of the first degree we have 

6 


AY 

Y 


For that of the second degree, 

A^ Y 


n - 1 

60 


(22.82) 


{n — 1) (77 — 2) 


^2 


AY 


6 


{a'l -h Saa) 


For the third degree. 


A^ Y 


A^- Y 


71 — 1 

Y = -f- 5flC’2 

- 840 


(22.83) 


{n — 1) {n 

60 

{n 


3) •' 


AY = - 


1) (ti — 2) 
6 


2) {n 

(tta + 7%) 


n — 1 


(uj "4” 5ci2 “H l^^s) 


Y = a^ “h Suj -f" 5 u 2 ~t~ 


(22.84) 
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The formulae for higher degrees are constructed on analogous lines, the multiplying 
factors for successive differences being given by 

(— + 1) (p + 2) . . . ( 2 p + 1) 

{n — 1) {n — 2) . . . (n — p) 
and the coefficients of the a’s by 


Y 

1 

3 

5 

7 

9 

11 

AY 


1 

5 

14 

30 

55 

A^ Y 



1 

7 

■ 27 

77 

A^ Y 




1 

9 

44 

A^ Y 





1 

11 

ds Y 






1 


We leave the proof of these results to the reader. 

For instance, for the data considered in the two previous examples we found, for the 
parabola of the second degree, 

Y = 24-160,8 + 2-608,6X + 0-061,533 (X^ — 14) 
tto = 24-160,769 ; = 5-217,253 ; a'^ = 0-270,749. 

Hence, from (22.83), 

= -??- -02 = 0- 123,068 

(n — 1) {n — 2) 

AY = ^ {a'l + 5a;) = ~ 3-285,499 

n — 1 

F = ao -h 3a; + 5a; = 41-166,273. 

We then build up the polynomial values as shown in Table 22.4. The second difference 
0-123,068 is shown at the foot of column (2). Being a constant, it could have been written 

TABLE 22.4 

Calculation of Polynomial Values from Differences. 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

Number of 

Second 

First 

Polynomial 

Observed 

Difference 

Term. 

Difference. 

Difference. 

Value. 

Value. 

(5H4) 

1 


— 1-808,68 

9-863 

10-16 

0-297 

2 


- 1-931,75 

11-795 

12-00 

0-205 

3 


— 2-054,82 

13-849 

13-90 

0-051 

4 


- 2-177,88 

16-027 

15-91 

- 0-117 

5 


- 2-300,95 

18-328 

17-93 

- 0-398 

6 


— 2-424,02 

20-752 

20-07 

- 0-682 

7 


- 2-547,09 

23-299 

22-71 

— 0-589 

8 


- 2-670,16 

25-969 

25-97 

0-001 

9 


- 2-793,23 

28-763 

29-00 

0-237 

10 


- 2-916,29 

31-679 

32-53 

0-851 

11 


- 3-039,36 

34-718 

36-07 

1-352 

12 


- 3-162,43 

37-881 

37-89 

0-009 

13 

0-123,068 

- 3-285,499 

41-166,27 

39-95 

- 1-216 


all the way up, but to do so is a waste of time (and in practice, of course, we should not 
devote a separate column to it). The first difference is shown at the foot of column (3), 
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and the dgures above it constructed by adding the second difference at each stage. The 
j>olynoniial values themselves are compiled by adding the first differences to the value 
at the foot of the column, 41-166,27. 

We have also shown the observed values and the difference between polynomial and 
observed values. The sum of squares of the latter is 5-204, agreeing within the margin 
of rounding-up error with the value for the sum of squares of residuals* found in 
Example 22.7. 

As an exercise the reader should work out the polynomial values for the third- and 
fourth-order polynomials and compare the sum of squares of residuals with the values of 
Example 22.7. 

Multiple Curvilinear Regression 

22 . 26 . We considered the linear regression of one variate on a number of others 
in Chapters 14 and 16. There now remains the extension of our results to the 
curvilinear case. 

The extension is very easy to carry out when we remember that in multiple linear 
regression there is no restriction on the degree of dependence among the “ independent ” 
variates. In particular, some of them may be functionally related, and more particularly 
still, one variate may be a power of another. It is thus clear that the process of fitting 
curved regression lines can be regarded as formally equivalent to that of fitting linear 
regressions. For instance, the fitting of 

Y — a^ Ui Xi <*2 A 2 -f- -^3 ~f" “t“ -^5 

is equivalent to 

Y = do ~1" -^1 4“ Us 2^1 -j- a^ Zf (Z 5 Zf, 

the latter being a particular case of the former where is the square of X^ (and their 
covariation accordingly complete) and similar relations exist between X3, Xi and Xg. 

The case of curvilinear regression for a single variate, which has occupied the fore- 
going part of the chaj^ter, could then have been treated by the methods of Chapter 15. 
We have discussed it afresh only because it is more easily dealt with by direct methods. 

22.27. In multiple regression analysis it sometimes happens that, having worked out 
n regression equation, we wish either to take account of a new factor or to remove one 
which appears redundant. To avoid the necessity of solving a new set of determinantal 
equations the following device is useful : — 

Consider the case of three independent variates measured from their mean 

Y — bx Xx 4 " &2 X 3 4 “ bg Xg. .... (22.85) 

In accordance with our general method the constants b are given by 

bx X {x{) -f 6s r {xx X 3 ) -\r bg Z {Xx Xg) = Z {Xx y) '] 

bx Z {xx X 3 ) + 62 i: (xl) 4- 63 i: {X 3 xg) =Z{x3y) > . . (22.86) 

hx z {xx xg) + 62 ^ (0:2 X3) 4- 63 r (xl) = i: (0:3 y) J 

Suppose now we replace the functions Z {xy) on the right by 1 , 0 , 0 and obtain the solutions 
l^^Cx 2 , bg^Cxg-, and similarly for replacement by 0 , 1,0 and 0 , 0 , 1 , 
the solutions being written 

bx — Cii, C12, Ci 3 

62 = C12, C22, C23 

63 = Ci 3 , C 23 , C 33 



. (22.87) 
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Then the solution of (21.86) is , 

bi — Cii 27 (a?! y) -f- Ci2 ^ y) -1- 27 {x^ y) 

62 = Cia 27 (ajj y) + C22 i7 (a^a y) + Cas Z {x^y)\ . . . (22.88) 

63 = Ci3 27 (aJi y) Z {x^ y) + C33 27 (3^3 y) ^ 

as is immediately evident on substitution. The values of the c’s are those we have denoted 
earlier in the chapter by determinantal forms, e.g. 

22.28. Now suppose that we wish to discard the variate x^. Erom (22.86), with 
1, 0, 0 written on the right, we find 


Cl o — 


where {jk) stands for 27 {x^ Xj^, and 

A = 


(11) 

(13) 

1 

(12) 

(23) 

0 

(13) 

(33) 

0 

(11) 

(12) 

(13) 

(12) 

(22) 

(23) 

(13) 

(23) 

(33) 


. (22.89) 


. (22.90) 


There are similar expressions for the other c’s. If the values of the constants when ajg 
IS removed are ^12^ ^22 shall have 


where 

Now we have 


^13 ^23 
Cqq 


(12) 

1 

/ 

1 1 (11) 

1 

(22) 

0 ’ 

C12 — 

A' (12) 

0 

A' 


(11) 

(12) 




(12) 

(22) • 


(11) 

(12) 

1 

(11) 

(12) 

(12) 

(22) 

0 

(12) 

(22) 

(13) 

(23) 

0 

(13) 

(23) 


(11) 

(12) 

0 


A 

(12) 

(22) 

0 



(13) 

(23) 

1 


(12) 

(22) 

(11) 

(12) 


(13) 

(23) 

(13) 

(23) 



Thus 

^12 


^13 ^23 ^12 ^^33 ^13 ^23 

^33 C33 


(22.91) 


(22.92) 


(12) 

(23) 

(11) 

(13) 

(12) 

(22) 

(11) 

(12) 

(13) 

(33) 

(12) 

(22) 

(13) 

(23) 

(13) 

(23) 

AA' 


(12) d 
AA' 


■ 12 - 


. (22.93) 
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Similarly 


^11 — ^11 r 


(22.94) 


Otyo 


This gives us the new c’s in terms of the old. Denoting similarly the new 6’s by 
we have 

K - K = (Cii - c;,) Z {x^ y) + (c,., — c;.,) Z {x^ y) + Z {x^ y) 


(22.95) 


primes, 


{ ^13 ^ (^1 y) ^13 <^23 Z {xz y) “j“ Ci3 C33 Z {x^ y) } 


C13 


Hence we have 


b[ = b. 


b'., = h‘ 


^iz bz 


C23 bz 


expressing the new constants in terms of the old and the known constants c. 
Finally, the contribution to the sum of squares due to the variate Xz is 

61 Z (xi y) + 62 2" (xz y) + bzZ (a^s y) — b\ Z {x^ y) — 6' Z {x^y) 

= 6, 2 : (a.-, y) + 6, Z (x, y) + h,S: (x, y) 


(22.96) 


(22.97) 


22.29. Generally, if there are p independent variates the equations for the 6’s are 
bi Z {x'D + 62 Z (a-i X2) + . . .4- 6^, Z (xi x^) = Z {y Xi) 

mm * « 

61 Z (Xi x.p) + 6a A (Xa Xj,) + . . . + 6^^ Z (xj) = Z{y x^). 

If x^, is omitted the equations become (p — 1) in number in variables 6^ . . . b'p_i. Sub- 
tracting from these the first (p — 1) of the above equations we find (^i — 1) equations, 
typified by 

(6' — 6j) Z (Xj^ Xj)-\r{b[^~-lK^ Z (x.^ x^-) + • • • + [bp-i —bp-i) Z (x^^_i Xj)—bp Z (x^- x^) = 0 

(22.98) 

But these equations are the same as those for the coefficients Cj^ . . . with {b\ — 6j) 
in place of Cj^, etc., and — b^ in place of c^j,. Hence 

b ^ _ c^p 
bp C'pp 

or b\ - 6, = - (22.99) 

Cpp 
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Similarlj it will be found that 


Cn — Cii 


^12 ^12 


Cip c^p 

^pp J 


. ( 22 . 1 (>()) 


with similar equations for the other c’s. 

22.30. Somewhat similar results apply when a variate is added, 
refer to new coefficients when is added, we have, as above — 

h\ — 


If primcM again 


^11 ^11 


^12 


n 

^QQ 

Cl/y Co 


^iq ^2q 

r» 

■'QQ 


. (22.101) 


In order to use these equations to adjust the constants we require ci c' and 6'. 

. ^ writmg down the equations satisfied by Cn . . . c^p and subtracting the corres])on(l- 
ing equations in we get p equations such as 

(^11 — Cii) i7 {x^ a;j.) + . . . + {c[p — Cj^,) Z (xj Xj,) = — Z {xj x^). 

These are the same as the equations in 6^ ... 6 with — c[ Z {x.x ) instead of N //) 
on the right, and hence ' 'J' / 

V 

^ip — <^iv=— ^ Cpj Z {xj x^). 


Thus, using (22.101), 


i=i 


'^pq 




p 

^Pj ^ ^3 

3 = 1 


( 22 . 102 ) 


The last of the equations satisfied by is 

Ci3 Z{x^x^) . . . + Cpq z {Xg Xp) + Z {xD = 1. 

Substituting for cj^, etc., in terms of we get 






~ ^ (^3 ^ 3 ) ^ ip^k ^q) 

3.*=1 


. (22.10,2) 


are derivable from (22.102). The other constants tiien 


This gives and . , 
result from (22.101). 

method is quicker than re-solving the reffres<?/ ^ where one variate is eliminated the 
two independent variates in the first instance • Tnd tT^t where there are only 

the method is quicker if the original numbf^r’nf variates are being eliminated 

the addition of variates the method all or more. Eor 

regression equations. ^ cases more expeditious than re-solving the 



MULTIPLE CUEVILINEAR REGRESSION 


171 


Example 22.10 (Cochran, 1938a) 


In a study of the effect of weather factors on the number of noctuid moths per night 
caught in a light-trap, regressions were worked out on Xi (minimum night temperature), 
Xs (the maximum temperature of the previous day), Xg (the average speed of the wind 
during the night), and X^ (the amount of rain during the night). The dependent variate 
was log (1 + n), where n was the number of moths. 

It was subsequently decided to investigate the effect of cloudiness, measured on a 
conventional scale as the percentage of starlight obscured by clouds in a night sky camera. 
This is the new variate Xg. 



The 

quantities for the 

first four variates 

were : — 





Zi 

Za 


Z3 


z. 



+ 0-105,423,56 

- 0-041, 

946,20 

— 0-096,067,09 

— 

0-018,490,96 


Za 

. . • 

-f 0-086,' 

038,69 

0-033,172,71 

+ 

0-012,903,58 

• . 

Z3 

■ « • 

. • • 


+ 0-572,652,01 

+ 

0-008,116,62 


z* 

• • . 

. . . 


. « . 

+ 

0-062,275,32 

and 

the 

sums Z (xj Xs) were 








Z {Xi Xs) = — 4-867, 

Xs) — - O' 

■206, Z (Xs Xs) 

_ 

0-5446, 



Z (x^ Xs) = - 5-42, 

Z (xt) 

= 7-87. 




We 

then 

find from (22.103) 









®55 

-f 0-210,133,14, 



and 

from (22.102) 








/ 

^ = 4- 0-369,198,24 

^25 

/ 

- 0-133,872,86 ^ = - 

0-118.533.74 



Co5 

f 

^65 


C55 





^ = -f 0-249,298,91, 







so that the new c’s are given by (22.101) as 


X, 

Xs 

X, 

X, 


X^ 

0-134,066,25 


Z "\r 
2 -^3 

— 0-052,332,16 — 0-105,263,03 

-f 0-089,804,68 + 0-036,507,20 

-t- 0-575,604,43 


The original regression coefficients were 



z. 


Zs 

+ 

0-000,849,84 

4" 

0-077,580,79 

+ 

0-005,890,52 

— 

0-028,131,12 

+ 

0-001,907,12 

— 

0-024,907,87 

+ 

0-075,335,08 


0-052,385,96 


. . • 

+ 

0-210,133,14 


= + 0-198,140,7 62 = + 0-038,528,4 - 0-508,649,2, 

6, = + 0-031,848,2. 


r> 

We now find ^5 = ^ ^ 2/) } 

3 = 1 

= - 0-227,149,6, 

and from (22.101) we then have 

b[= + 0-114,277,5 6; = + 0-068,937,6 63 = - 0-481,724,3, 

6^ = _ 0-024,779,9. 


As usual we have retained more figures than are necessary, in order to avoid cumulating 
errors and to facilitate the detection of computational slips. 
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22.31. The constants c found in the foregoing method have a further use ; they 
give the standard errors of the regression coefficients and provide some of the functions 
required in more exact tests based on the f-distribution. If, measuring y about the mean, 
we have 

Y = hx -2^1 + bz Xa + . • • -{-bp Xp, 

then there are p equations of the kind : 

X {Xx y) —bxXxl + bzX {Xx aja) + . . . -{-bpS (xx Xp), 
and thus, recalling the definition of the c’s, we have 


bx = Cxx X {Xx y) + CxzX {xzy) + . . . + c^p X (Xp y). 
Thus, for fixed values of the x’s. 


var bx = var y 





= CiiVary, ...... (22.104) 

and so for the other h’s. 

For large samples var y may be taken to be the estimated variance 

— ! — ~r (y - y)\ 

n — p — 1 

If the sample is small and it is desired to make a more accurate test, then we have, 
by an extension of 22.21, that 


_ ih - ft-) s/ {n -'p -l) 


t = 


X {y - yY 

is distributed in “ Student’s ” form with v = n — p degrees of freedom. 


. (22.105) 


22.32. As a final comment we may emphasise that regression equations are only 
polynomials fitted to the means of arrays, and consequently that if the scatter about 
those means is substantial they are not very reliable as estimators (though they may he 
better than other methods). The comment would hardly be necessary were it not for a 
tendency to use the equations somewhat uncritically for purposes of prediction. The 
point assumes even greater importance when attempts are made to estimate the dependent 
variate for values of the independent variates outside the range on which the regressions 
are based ; or again, if the observations are distributed over time so that the population 
may be changing while the sample is being drawn. The technique of regression analysis 
is undoubtedly useful in many fields, but — as with many other statistical techniques — 
the careful investigator will apply it with a certain amount of self-discipline. 


NOTES AND REFERENCES 

The theory of curvilinear regression was studied by Karl Pearson (1905). Orthogonal 
polynomials had been considered, and the essential problems solved, by Tchebycheff as 
far back as 1857, but their use in statistics was not fully appreciated until about sixty years 
later. Pearson gave in 1921 the general formulae for fitting curved regression lines up to 
the fourth order. Neyman (1926) pointed out the elegance of the determinantal approach. 

From aboijt 1920 onwards there may be discerned two main lines of development. 
The Scandinavian school, led by Wicksell, has developed the analytical theory of regression 
— see Wicksell (19176, 1933, 19346) and a useful memoir by W. Andersson (1932). The 
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second line, followed by Fisher, Aitken and others, has been concerned with the fitting of 
regression curves to arithmetical data and exact significance tests — see Fisher’s papers of 
19216, 19226, 19246, 1926a, a paper by Allan (1930), and three papers by Aitken (1933a, 
6, c). The literature on orthogonal polynomials is now very large. 

For some illustrative material, see K. Pearson (1905), Andersson (1932), and Pretorius 
(1930). See also references to Chapters 14 and 15. 


EXERCISES 


22.1. Show that the regression of y on the variance of x (the scedastic curve) is 
given by 


y _ V / V (- 1)^' V ( 


where 


&)-{ 


Kjl ^ 


9 m y (X ) 
sj “ ff(X) g{X) 

2 


(Wicksell, 19346.) 

22.2. Show that if the regression of y on the mean of x is linear, then from (22.11) 




d 


is a linear function of (j) {ti) and (^i). Hence that 

^20 ~ ^11 ^^ 4 - 1 , 0 


(Wicksell, 19346.) 


22.3. Show that if the marginal distribution of a bivariate distribution is of the 
Oram-Charlier Type A : 

y = a {x) {1 -f- i /3 - 1 - a^ -f" . . . } 

the regression oi y on x is 


OCl OC 


X X Hinc m 

00 


j :i 


(Wicksell, 19176.) 


that 


22.4. Transforming the orthogonal polynomials of (22.74) to a new variate 

n — 1 


’, note that is a numerical multiple of say Show 




and deduce the recurrence relation, 

P,. 


xp;^,_, 

2 

P 


r>2 ’ 

^ p— 2 


fP _ - (p - 1)’} p 

' 4 (2p — 1) {ip — 3) 

(Allan, 1930. The relation is due to Tchebycheff.) 
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22.5. A regression line 

F = do -)- 111 X ■)■ ffij "I" flj "I" (I4 

is fitted to normal data and tlie number of observations X is large. 

u'2 

between the variates and c = — (the moments referring to the 

/Is 


If r is the correlation 
a:-variate), show that 


var do = jY ~ 

^TlV 

* 

var di = (15 + 30c - ISc* + dc*) (1 - r=) 


var d. 


my 

Mi4 



(Andersson, 1032,) 


22.6. In the notation of 22.31 show that 

cov (bi bi) = C 12 var y 

and hence show how to test the difference of two coefficients in a regression equation. 

22.7 . Show how to derive a test of the significance of the difference of corresponding 
regression coefficients in two equations derived from independent samples, based on the 
result of 21.26. 



CHAPTER 23 


THE ANALYSIS OF VARIANCE— (1) 


23.1. At various points in this book we have encountered in different guises the 
result that the sum of squares of a set of observations about their mean can be represented 
as the sum of two independent sums of squares, each of which provides an estimate of 
the parent variance ; and that their ratio provides a test of homogeneity, at least when 
the parent is normal. We now proceed to study in more detail a method of statistical 
analysis with considerable generality which springs from this result. In view of the com- 
plexity of the general case we shall begin by considering simpler cases under somewhat 
restrictive conditions and shall extend our results stage by stage. 


One-way Glassification 

23.2. Suppose we have a set of variate-values divided into p families : 


/y* rv* 

•^11 ‘^21 

^12 ^22 ^ 11,2 


/y* *y* ^ 

W 

Denoting by x the mean of the whole set and by Xj the mean of the values in the jth family, 
we have the identity 

^ - x)^ ^ (Xfj - Xf -f Xj - :r)2 

i, 3 r, 3 


since the cross-product term 

Uj 



^ {Xj-x)% . . . (23.1) 

iy ? 

— x) vanishes. We may also write this as 


- ^)2 ^ ^ '^3 


(23.2) 


'^2 J 


where n.j is the number of members in the jth family. 

It will also be convenient, from the point of view of a later generalisation, to write 
the mean of the jth family as x^j and that of the whole as x^ , the periods in the subscripts 
showing which factor is being averaged. We have then the alternative form 


^ K; “ ^ i^ij - ^.i)" + ^ nj (a;.,- - a:..)2 . . (23.3) 

i. 3 i, 3 j 


23.3. The problem we shall discuss in connection with families of values of this type 
takes some such form as the following : the members of each family are randomly chosen 
from some parent population corresponding to that family. The populations themselves 
are, as a rule, defined by some prior system of classification given among the data of the 
problem, e.g. they might be different varieties of wheat, the x’s being the yields of the 
varieties grown under similar conditions, or they might be defined by income levels and 
the x’s the expenditure on food of a sample chosen from the different income groups. We 
now ask ; is there any evidence that the factor measured by x varies significantly from 

176 
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family to family ? Alternatively, can the data be regarded as homogeneous, i.e. as emana- 
ting from populations which are identical so far as concerns the factor measured by x ? 
Further, when the question of significance is decided, how can we estimate the variation 
of X in families or groups of families, and how can we estimate the magnitude of any 
differences which exist ? 


23.4. We will assume, until further notice, that within each family the variation 
is normal with variance v, and that v is the same for each family. In later sections we 
shall endeavour to remove these rather restrictive conditions. On our present hypothesis 
the populations corresponding to the different families can differ, if at all, only in their 
means, and our first question is whether the sample values afford any evidence of such 
•differences. 

Let us take as our hypothesis that the parent populations have a common mean m. 
Then we recall the following facts : — 

(1) The sum ~ I" (x^j — x^J^ is distributed in the Type III form of with 
W — 1 = 27 (-w-j-) — 1 degrees of freedom, that is to say as the sum of squares of Y — 1 

3 

independent normal variates with zero mean and unit variance. 

71 * 

is distributed normally with unit variance about 

mean m, and is independent of the sum - — x^j)^ which is itself distributed as x‘^ 
with rij — 1 degrees of freedom. 

Since on our hypothesis the observations may be regarded as a single sample from 
the same population, it follows that 


(2) In any given family x^ 






is distributed b,s x^ with N — 1 d.f. 


%0 


i:{n.j~l)=N - p d.f (23.4) 


- Sni{x,i~x,y „ „ 29 - 1 d.f 

The only statement requiring any proof is the last. It may be proved directly (see Exercise 
23.1), but we shall deduce it as the corollary of a general theorem due to R. A. Fisher which 
will often be required in this chapter. 


23.5. Suppose we have q variates x^^ . . . x^ which are independently and normally 
■distributed with unit variance about the same mean, which we may assume to be 
zero. Put 


Q 



S=1 


If we choose the coefficients 1 so that 


(23.5) 


r = t\ 

® =0 r ' 

then each C is distributed normally with unit variance independently of the others. 


(23.6) 

There 
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are coefficients I, and the equations (23.6) impose -f 1) conditions on them, so that 
the A’s can always be found in a multiphcity of ways. In effect they correspond to the 
rotation of orthogonal co-ordinate axes in a g'-dimensional space. 

Now suppose that we have Ti linear functions of the a;’s, Ci • • . Ca < <l) whose 
coefficients obey the orthogonality relations (23.6). These h variates are then distributed 
independently, normally and with unit variance. 

It is now possible to find q —h further variates C/t+i • • • 'Cq which are orthogonal 
among themselves and to Ci • • • Geometrically this is evident from the possibilities 
of rotations in the g-way space. Algebraically it follows from the consideration that if 
qh of the jl’s in (23.6) are known, q[q —Ti) are unknown, and the number of conditions 
they must obey is 

\q {q -|- 1) — {h 1) — \ {q — h) (q -j- -1- 1), 

so that values of the unknowns can be found in at least one way if 

^ {q h 1) ^ q 
or ^ -1- 1 < g. 

Now suppose we express a sum of squares of q normal variates with unit variance, 
say A, as the sum of two quantities B and G ; and suppose that B is distributed as the 
sum of squares of h independent normal variates with unit variance which are linear 
functions of the variates entering into A. Then we can find q — h such variates inde- 
pendent of the first A, and C must be their sum of squares. Further, the distributions 
of B and C are independent. By an extension of the same argument, if 

A = Ai A.^ A]^j .... (23.7) 

A is distributed as with v degrees of freedom, Ai with i-i, . . , A,f._x with ; and 
ifthe variates entering into A 1 . . . are mutually independent and are linear functions 
of those entering into A, then A,^ is distributed a.s x^ with v/. degrees of freedom, where 

V = j-i + V., -I- . . . -f- .... (23.8) 

and A/^. is independent of Ai, . . . A]._y. 


23.6. As an extension and kind of converse of this thcoi’em we have the result, due 
to Cochran, that if Ai . . . A,^ are distributed as with ri . . . v,^ degrees of freedom, 
and their sum A is distributed as x^ with i- = 27 (r;) degrees, then Ai . . . A,^ are inde- 
pendent. We will prove this for the case k -= 2, the more general result following in a 
similar way. 

If the characteristic function of Ai and A., is ^ {ti, t^), we have, by hypothesis, 


and 

Hence 


<f> {k, 0) 

4* (^5 ^2) 
<f> (t, t) 
<j> {t, t) 


1 

(1 - 2'i^i)^’’^ 

1 

(1 - 

1 

(1 — 

<4 ik b) (j) (0, t) 


(1 - 2i«)i^’'- 


l + Oi)’ 


and thus 4) {t, 0) and ^ (0, t) are both divisible 

A.S. — VOL. n. 


by a factor in (1 — 2it) ’ and no other 

N 
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factor in t because of the symmetry of (f> (<i, t^). These factors are identified by cf> (ti, 0 ) 
and <f) ( 0 , ia) as (1 — and (1 — and hence 

4^ (Lj ^ 2 ) = 4^ fi) 4^ (fij ^ 2)5 
or Ai and A 2 are independent. 

23 . 7 . Let us now return to the statements in (23.4). The sum ^ 27 (a;^ — a: is 

1 

distributed as with v—N — 1. The sum -E{Xi^ — x^jY is so distributed with 

= N — p. Further, the quantities a:,y — X j may be transformed to N — p independent 
normal variates which are linear functions of the variates entering into the first sum. It 

1 

follows from 23.5 that because of the identity (23.3) the third sum - E {x^^ — ^..Y is 

distributed as x^ with = (N — 1) — {N — p) = p ~ 1 degrees of freedom, and that 
independently of the second sum. 

Thus we may exhibit our break-up of the total sum in the following form ; — 


TABLE 23.1 

Form of Analysis of Variance for One-way Classification. 


Sum of Squares. 


Of family means about the mean of the 
whole 

Of individuals in families about the 
respective family mean . 

Of individuals about the mean of the 
whole 


27 (x.j — x.,Y 
J 

^ (^if - ^.iY 


J 


i^i} - 


I, } 


d.f. 

p — 1 

N - p 

N - 1 


Quotient. 


— r m {x,.j - X..V 
P J 


N - pjLj 

i, j 


N 


Z 


X..Y 


% 3 


We note that the sums of squares and the degrees of freedom in the first two rows sum to 
those in the third row (though the quantities in the quotient column are not additive). 
This is the origin of the expression “ analysis of variance,” though, to be accurate, it is the 
sum of squares of the total which is analysed. 

To avoid cumbrous phrases we refer to the sum of squares of family means about 
the mean of the whole as the sum of squares “ between families,” and to that of individuals 
about the respective family-means (for the time being) as “ residual.” We shall also speak 
of total sum of squares and total mean with the obvious significance, and denote degrees 
of freedom by the initial letters “d.f.” * 

23.8. Since the mean value of x^ with v degrees of freedom is v, the quotients in 

* The need has been felt for a word to denote ‘'sum of squares about the mean”. Professor 
Pitman has suggested the word “ squariance ”, though he seems to feel that this leaves something to 
be desired. In my own notes I use the word “ deviance ” but have not ventured to introduce it into 
the text. 
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(23.1) are all unbiassed estimators of v, the parent variance. Only the first two, however, 
are independent. We recall that the ratio 


2 = I log 


N — p Z Tij {x^j — x_Y 

p - 1 2* {Xij - 


(23.9) 


is distributed in Fisher’s form, which is independent of the variance v. This distribution 
accordingly provides a convenient test of significance in the normal case. 


Example 2S.1 

Let us consider the application of the foregoing theory to a simple example which 
has been chosen to reduce the arithmetic to a small amount. The following shows the 
lives in hours of four batches of electric lamps : — 

Batch 1 : 1600, 1610, 1650, 1680, 1700, 1720, 1800. 

Batch 2 ; 1580, 1640, 1640, 1700, 1750. 

Batch 3 ; 1460, 1550, 1600, 1620, 1640, 1660, 1740, 1820. 

Batch 4 : 1510, 1520, 1530, 1570, 1600, 1680. 

We know that the batches were made from four different specimens of wire, but were other- 
wise made under identical conditions. (This, of course, over-simplifies the problem as it 
is encountered in practice, but will serve for purposes of illustration.) The question is, 
do the batches differ among themselves in length of life ? If so, we suspect that the quality 
of wire is varying materially, and if the lamps are to be standardised as far as possible the 
quality of wire must be made more uniform from batch to batch before manufacture is 
undertaken. The numbers in this example are small, but not much smaller than would 
be desirable in practice, owing to the expense and time involved in testing a lamp by running 
it until it burns out. 

The sums of x and x'^ for the four batches will be found to be — 







Number in Maniple. 

(X) 


Batch 1 , . . . 

7 

11,760 

19,785,400 

„ 2 . . . . 

5 

8,310 

13,828,100 

„ 3 . . . . 

8 

13,090 

21,503,700 

» 4 . . . . 

6 

9,410 

14,778,700 

Totals 

26 

42,570 

69,895,900 


Thus for the mean life of lamp in the four batches we have 11,760/7 = 1680; 
8310/5 — 1662 ; 13,090/8 = 1636-25 ; 9410/6 = 1568-33. These certainly differ, but is 
the variation such as cannot have arisen by mere sampling fluctuations 1 
We find 

a;,. = 42,570/26 = 1637-3077. 


2 (x,.,. - = Ex% - Nxl 

= 69,895,900 - 69,700,189 
= 195,711. 


Thus 
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We also have 


Sn^ {xj -xj^ 

3 


The analysis then takes the form- 


S (n^ Xj) x^j ■— Nxj^ 
44,360. 


Sura of Squares. 

d.f. 

Quotient. 

Between batches 

44,360 

3 

14,787 

Residual 

151,351 

22 

6,880 

Totals 

195,711 

25 

7,828 


We have 


z 




= i logc 
= 3, 


14,787 _ 
6880 ~ 
^2 = 22 . 


0-383 


The 5-per-cent, point for these degrees of freedom is seen from the tables to be 0-5574. 
The observed value is therefore not significant, and we conclude that, so far as this test is 
concerned, there is nothing to throw doubt on the homogeneity of the group. 

Having decided, provisionally at least, to accept the hypothesis that the data are 
homogeneous, we may ask, what is the best estimate of the parent variance ? Our analysis 
has given three different estimates, viz. 14,787, 6880 and 7838. It seems natural to use 
the last, which depends on the greatest number of degrees of freedom. 

With this value we find for the variance of the mean of samples of n, 


'7828 88-48 


n 


\/n 


The greatest difference of means observed is that between the first and fourtli batch, 
1680 — 1568-33 = 111-67. The standard error of this difference is 


88-48 (i + i) = 49-2. 

The observed difference is rather more than twice the standard error, but we cannot con- 
clude that it is significant on that account. In fact, we have picked out the greatcM difiler- 
ence for examination from the six possible comparisons of pairs, and the distribution of 
the greatest difference must have a larger standard error than that of a difference chosen 
at random, which is what we have found. Nevertheless the fact that even the greatest 
difference is only shghtly in excess of twice the standard error affords some general evidence 
in support of the h3rpothesis of homogeneity. 

We may also note that if a more accurate test of the difference of two means is required 
the ^-test may be invoked ; but here also we must remember that we are testing the greatest 
of a set of differences. Where there are only two families concerned, the analysis of variance 
reduces to the i-test for the difference of sample means when variances of the parents are 
assumed equal. 


23.9. Suppose now that in the case of one classification we have applied a test by 
means of the analysis of variance and have found that the hypothesis of homogeneity is 
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unacceptable, or, in plain English, that the parents do differ. Let us then consider the 
alternative that the populations are still normal and that they differ in their means hut 
not in their variances. 

At first sight this may seem a highly artificial assumption to make, for if the popula- 
tions differ in their means it is not unlikely that they may differ in other respects. This 
is undoubtedly so, hut if there is serious possibility of difference in variances their homo- 
geneity may be discussed separately by means of tests we shall consider in Chapter 26. 
Apart from this, there often arise in practice situations in which approximate equahty of 
variance is plausible on prior grounds. For instance, we may be testing the effect of 
manuring on cereal yields, and it is reasonable to suppose that if the manure exerts any 
effect at all it will increase all plants of the same variety to about the same extent — that 
it will, in fact, displace the location of the distribution of yields without affecting 
its dispersion. 


23 . 10 . The question we have now to consider is whether we can ma,ke an estimate 
of the common variance of the populations. A little thought will show that we can. The 
reasoning which led to the conclusion that the residual sum of squares is distributed as 
with N p degrees of freedom remains unchanged, so that the residual quotient in 
Table 23.1 continues to provide an estimator of v. The other two no longer do so. Con- 
sider, in fact, the sum of squares between families, and let the mean of the jth family be 

m A 


Then we have 
'■J 


E I fij (xj 


it' — ]s y ’hi 


m 


{x.. - w. .) -h 


m 


l 2 

. J 


ll(‘rf' ») 


^ r • 


1 


E Z n.j { 

3 


rn 


(x,, — (23.10) 

3 

3 n^j has the mean Thus 

(.t — )} “ is distributed as vx“ with p — I degrees of freedom and 

E Zn.j -- x^y = {p - 1) V ^ Z {m,j - w. 


i.s the mean ^ 


tn 


rij nij and hence .i'j 


(23.11) 


Not unlc^ss m.j = w/... that is, all populations have the same mean--does 


on the right reduce to {p - 1) and hence the quotient between families give an unbiassed 
estimator of v. In other cases it is greater. 

Similarly, 


^ J ^ 


m.j - {X.. - w.) X 

2 . . . . ( 23 . 12 ) 


= {N -l)v + Znj (m.,- - w..) 

TUv cjcnectation of the difference of the two terms consMered in (23-11) and (23.12) con- 
llrms til at the residual sura of squares provides an estimator of (A p)v. 

23.11. A comparison of the formulae we have already reached and those of section 
1 4 31 will show that the study of intra-class correlation is very closely related to the analysis 
io It « an interJting exercise to derive the .-test directly from the samplmg 
riiswirion’ of intra-class r given in equation (14.110) (vol. I. p. 362) and vice-versa. 

' "" 2 T 12 ^'^wfprocled to the case when the variate-values belong not to 0 “ 

set of fidiies W to two. say .4 and 5. In the fest instance we shall consider the situation 
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when there is only a single value in the ^'th class of A and the fcth class of B. 
may then be set out in the tabular form : 


Class B 



Si 




Ba 

Totals 


^11 

^12 

^13 

. 

^Iq 



^21 

^22 

^23 

. 


qx2. 


^31 

^32 

^33 

. 



• 

• 

• 


• - * 

* 

• 

* 

Ap 

^23 1 

Xp2 

^pS 

• 

- 

Xpq 

qxp. 

Totals 

P^.l 

px,2 


. 

P^.q 

pqx.. 


Our samjjle 


(23,13) 


This is not a contingency table. The numbers Xj/^ are variate-values, not frequencies. 
As usual, signifies the mean of values in the class Aj and X j^ the mean of values in the 
class being the mean of the whole. 

We have the algebraic identity 

^ i^jk — ^ i^jk - — x.k + + Xj, - x,y 

3, k }, k 



Xa 


x.k + ^ {xj^ — -f ^ {x^^ ~ x^y 

h k j, h 


-Xj. ~Xj, -f - x^y -^pEix^j, -x^y (23.14) 


the cross-product terms vanishing on summation in the usual way. 


23 . 13 . We are interested in the variation of the x’s according to class membership. 
Let us take as our hypothesis that the pg' values are homogeneous, that is to say that they 
all emanate from (normal) populations with the same mean m and variance v. In such 
a case class-membership exerts no influence on variate- values, and the observed differences 
are pure sampling effects. 

The expression on the left in (23.14) is then distributed as vx‘^ with pq ~ 1 degrees 
of freedom. The mean Xj_ is distributed normally with variance v /q and thus E q{Xj — x ) 

is distributed as vx^ with ^3 — 1 d.f. Similarly, Ep {x^j^ — x )2 is so distributed with 

q — I d.£ Finally the remaining term on the right is distributed as with — l) 1 ) 

d.jf. ; for each term is normal with variance _^) ^ since 

pq 
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so that the sum of squares of coefficients on the right is 

; - i)r- + _ 1) (P^Y + (p - 1) f 

' — ' \ pq 


pq 


pq 


+ 


{p -l){q - 1 ) 
ipq)^ 


ip - 1) (g - 1) 


. (23.15) 


Tluis, since there are p q — I linear relations connecting the pq quantities 


/V* /y» I . 

^jk '^.k 1 

their sum of squares is distributed as vx^ with pq — {p + <1 — ^) = {p — 1) {q — 1) degrees 
of freedom, which checks against the mean value of the individual square given by (23.15). 
We may thus analyse the variance in the following way : — 


TABLE 23.2 


Fomi of Analysis of Variance for Two-way Glassification with One Member in each Subclass 


Sums of Squares. 

d.f. 

Quotient. 

IJotwc'eti .4 -classes 

qUixj. -X.S- 
i 

p - 1 

® Zixj. 

P - ^ 3 

B(qAvoou /?-classes 

pSixjc -X..V 
k 

q - 1 

— £ {x.jc — x,y- 

e - 1 * 

KcHulual 

(xjk - x-j, - xjc -f 

J 1 k 

1 

{p -Diq- 1) 

1 

{p -l)(q- 1) 

^ (Xjk - Xj. — x.k -fa;..)® 
j, k 

'FoTAIvS . 

- x.,)^ 
t A: 

pq - 1 



Th(^ sums of squares and degrees of freedom (but not the quotients) are additive as 
before. It follows from the theorem of 23.6 that the three constituent sums are inde- 
{xuKkmt. Rach quotient provides an unbiassed estimator of v. 

23.14. Our use of these results proceeds by an easy generalisation of the method 
<\xein|)liHe(l in Example 23.1. We take as our hypothesis the supposition that all samples 
arc^ from noi'mal populations with identical mean and variance. Comparison of the esti- 
niJitcH ill the quotient column then provides a test of significance. If the hypothesis is 
I'cicicted we may examine the alternative that means are different but variances identical 
tliroughout, in. which case we shall find that the residual still provides an estimate of the 
variance, provided that an important additional assumption is made. 


PJ.minple 2S.2 

The following data (Daniels, Supp. J.R.S.S,, 1938, 5, 89) show the weight in grams 
of 95-yard lengths of wool thread from 100 “ ends ” being spun on four bobbins, 25 ends 
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to the bobbin. We are interested in two factors, the variation between bobbins and the 
variation in the 25 ends on the same bobbin, according to their position. 


TABLE 23.3 

Weight in Grams of 100 ^6-yard Lengths of Wool Thread spun on Four Bobbins. 


End Nixmber. 

Bobbin Number. 

Totaus . I 

1 

2 

3 

4 

1 

7-60 

7-23 

7-50 

7-53 

29-76 1 

2 

7-52 

7*81 

7-77 

8-05 

31-15 1 

3 

7-70 

7-94 

7-83 

8-16 

' 31 - 6.3 ! 

4 

7-93 

7-94 

7-96 

7-76 

31-59 

5 

7-78 

7-89 

8-02 

7*85 

31-54 

. 6 

7-73 

8-23 

7-99 

8-14 

32-09 

7 

8-07 

8-27 

8-25 

8-26 

32-85 

8 

8-01 

8-54 

8-24 

8-54 

33-33 

9 

8-22 

8-24 

8-37 

8-10 

32-93 

1 10 

8-24 

8-35 

8-43 

8-15 

33-17 

! 11 

8-17 

8-29 

8-46 

8-38 

33-30 

! 12 

8-09 

8-54 

8-33 

8-47 

33-43 

13 

8*11 

8-45 

8-27 

8-38 

33-21 i 

14 

7-96 

8-43 

8-24 

8-60 

33-23 

15 

8-09 

8-47 

8-12 

8-45 

33-13 

16 

8-04 

8-33 

8-14 

8-43 

32-94 ! 

17 

7-78 

8-47 

8-19 

8-57 

33-01 

i 18 

8-11 

8-63 

8-36 

8-38 

33-48 

1 19 

8-17 

8-31 

8-31 

8-16 

32-95 

1 20 

8-12 

8-31 

8-47 

8-41 

33-31 i 

I 21 

8-13 

8-10 

8-19 

8-27 

32-69 

22 

8-01 

8-01 

8-37 

7-96 

32-35 i 

23 

8-17 

7-92 

8-27 

8-08 

32-44 

24 

8-05 

8-27 

8-07 

8-16 

32-55 1 

25 

7-91 

7-92 

8-28 

8-52 

32-63 ' 

! 

1 

TOTAIiS 

199-61 

204-89 

204-43 

205-76 

814 .* r >9 


It simphfies the arithmetic if we take a working mean at 8-00. The total sum of 
squares about this mean is then found to be 

= 9-3829, 

and we have also 

A {x.j,) = 14-69. 

Hence 

= 9-3829 - (0-1469) (14-69) 

= 7-224,939. 

The means of the four bobbins are 


7-9844, 8-1956, 8-1772, 8-2304. 
With the same working mean we find for the sum of squares 

= 0-122,986,72 ; 


TWO-WAY CLASSIFICATION 


185 


and hence 

= 25 (0-122,986,72) - (0-1469) (14-69) 

= 0-916,707. 

The means of the four ends of corresponding position on the four bobbins can, of 
course, be found from the totals in the last column of the table, but it is simpler to find 
E{qXj^ — qx y and then divide by q^. We find 

E{Xj, - a ;..)2 = 4 (27^I83 1j _ ( 14 - 09 ) 

= 4-637,814. 

The continual appearance of the factor (0-1469) (14-69) = iVxf, is to be noted. The 
quantity is best computed once for all at the outset. 

The residual sum of squares is then obtainable by subtraction, and we have the 
following analysis 


TABLE 23.4 


Ayialysis of Variance for the Data of Table 23.3. 


Sums of Square.s. 


d.f. 

Quotient. 

Between bobbins .... 

0-916,707 

3 

0-3056 

Between ends 

4-6.37,814 

24 

0-1932 

Residual , 

1-670,418 

72 

0-0232 

Totals j 

7-224,939 

99 

0-0730 


The variation between bobbins and that between ends are both significant — ^the ratio 
of the corresponding quotients to the residual quotient is so big in each case as hardly to 
require the s-test. We are led to suspect that the variation between bobbins, small as it 
is, cannot be a chance elfect, and it looks as if bobbin number 1 is not getting its fair share 
of thread. Similarly, the weight of thread seems to be dependent on whereabouts the 
thread is spun on the bobbins, and an inspection of the original data suggests a systematic 
variation as we proceed along the bobbin from end number 1 to end number 25, with a 
possible maximum in the middle. If the manufacturing process is to be standardised as 
much as possible, we should have to examine the reasons for the shortage of weight on 
the first bobbin and for this systematic effect of position on the bobbin. 

23.15. Suppose now that, as in the example just given, the hypothesis of homo- 
geneity is rejected. What interpretation can we put on the residual quotient ? Let us 
assume that each observation comes from a normal population with variance v, but that 
the parent mean of the subclass is m^j,, these quantities varying from one subclass 

to another. Is the residual quotient an unbiassed estimator of v 1 In general the answer 
is “ no ”, but there is an important class of case in which it is affirmative. 

Let be the mean of the q values of in the class Ap m j^ that of the p values 
in Bj., and the mean of the whole set of m’s. Then we may write 

'V ‘ • • ■ - • (23.16) 

Xp = nip -f- ^p, etc. ..... (23.17) 
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Then 

E S -a;.* +33. J 2 ^_gj {m^j, — m,-. — m.* +m. . +|,.* - +|_ _ ) 2 

=E E {mjj,-mj^-m^j,-]rm^y+E E . (23.18) 

the product term vanishing as usual. The second term on the right is equal to 
(^ — 1) (g — 1) V, for the ^’s are distributed with variance v about zero mean, so that the 
term in question is the residual sum of squares in a p x q two-way classification of a homo- 
geneous sample and hence has the stated expectation. Thus we have 

EE (+.* — xj^ — + x^y = E{mjj, — — 'W * + m..)^ + (^ _ 1) (q _ i) v. (23.19) 

The residual quotient will then provide an unbiassed estimator of v if and only if 

mjjc — + m.. = 0. , . . , (23.20) 


23.16. Now suppose that Xjj^ is made up of three parts which are additive, viz. 

(1) the effect of the class Ap say ; 

(2) the effect of the class say ; and 

(3) a residual Cjk which is normal and has zero mean. 

This kind of hypothesis will recur frequently. It amounts to an assumption that there 
is in an element +• which affects alike all members of the class Aj but varies from one 
A-class to another ; an element bj^ which similarly affects alike all members of but varies 
from R-class to 5-class ; and a third component representing random variation which, 
apart from the sampling factor, is the same for all subclasses Aj B,^. We then have 


+7c — % + + Cjk .... 

and 

mjj, = aj + 6 ;,' 

= + + 6. I 

m — a + b 

where, as usual, the subscript periods in the a’s and 6’s denote averaging. 

m^y. — mp - rrij^ + m.. = — (aj + 6.) — (a. + by) + u + 


. (23.21) 

. (23.22) 

Thus 

b. 


= 0 , 

so that (23.20) is satisfied and the residual quotient is an unbiassed estimator of the 


variance v. 

Under the same conditions it will be found that 


q E E {xp — a;..)2 = {'p — l) v q E {m.j — m 
i J ' " 

= {p ~ 1) V -i- q E {aj — .... (23.23) 

j 

P B E (xj, ~ x^y = {q — l)v -h pE {by — bj'’^ .... (23.24) 

k k 

E E {xp, — x_y = {pq — i)v ^ {aj — + + by — 6.) = 

= {pq - l)v qE {aj — ay +pE (by - by . (23.25) 

3 k 


23.17, We have supposed that the component C had a zero mean, but of course if 
all these components had the same mean, the constant common to them could be absorbed 
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into the functions aj and 6^.. Our hypothesis is thus a little more general than it appears. 
In certain practical cases it is a plausible hypothesis to make. For instance, in Example 
23.2 it is reasonable to suppose that the effect of a particular bobbin is the same for all 
ends, and the effect of situation the same for all bobbins. If there is any serious doubt 
on the point we have to collect further data and consider interactions in the manner 
described later (see 23.22). 

It may, however, be noted that if the variation of the w^z/s is comparatively small 
the appearance of the term containing them in (23. 19) does not materially vitiate an estimate 
of V from the residual quotient. In any case that estimate will be greater than the unbiassed 
estimate, so that our inferences about significant differences of mean values will, properly 
interpreted, be on the safe side. 


23.18. Before going farther we may remark that the quantity we have called the 
residual sum of squares and the associated quotient are often referred to as “ error ” or 
“ interaction ” terms. The former is likely to cause misunderstanding and is better avoided 
altogether, for, as we have seen, it provides a measure of sampling variance, and there- 
fore of experimental error, only in particular cases. The word “ interaction ” we shall 
define below ; it has been used in different senses by different writers, and when consulting 
original memoirs the reader should endeavour to ascertain the precise meaning which 
is being attached to it — if he can. In considering a given analysis it is as well to reflect 
on the precise nature of the items covered by such expressions as “ residual “ remainder ”, 
error ” and so forth. 


Three- way Classification 

23.19. Consider now the case when there are three classifications into A-, B- and 
C-classes. As before, we shall consider in the first place one member in each subclass 
Aj Cfi typified by We now have 

X * (-Lv.-/ - ^ i^j.. - ^ i^.k. — * ^ {^..i ~ *...)^ 

3, k. /, 


+ Y {xj„^ + x...)--' + A {X.JJ, x,j + x...)2 

^ ^jk. ^.ki “b 4“" ^.k. 4“ ^.,1 • (23.26) 


the summations extending over all members of the sample, in number, so that we may 

replace expressions such as (x^ _ — by qr S {Xj^^ ~~ 

i, k, /, ^ 

On the usual hypothesis of normality and homogeneity we find that the first three 
terms on the right of (23.26) are distributed as with q) I, q — 1 and r — 1 degrees 
of freedom. The second group is so distribiited with (p — 1) (<7 — !)> (p ~ 1) 1) 

1) (^' “ 1) degrees of freedom. The last is distributed with (p — 1) {q — 1) (r — 1) 
degrees of freedom. All but the last of these results follow from the two-way case, and 
the last may be established as in 23.13 or by the consideration that for any fixed I the 
term has {p — 1) {q — 1) degrees of freedom and that there are (r — 1) independent Z’s. 

We may then write the analysis in the form shown in Table 23.5. (For the present 
the expression “ interaction AJ5 ” is to be regarded merely as a name given to a particular 
sum of squares. As before, the sums of squares and degrees of freedom are additive, 
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and the seven items into which the total sum of squares is analysed are distributed 
independently. ) 

TABLE 23.5 

Form of Analysis of Variance for Three-way Classification with One Member in each Subclass. 

Quotient. 


The quotient of 
the sum of 
squares by the 
corresponding 

d.f. 


23.20. If the hypothesis of homogeneity is rejected we may consider the alternative 
represented by 

+ bk + C; + Cjki> .... (23.27) 

where C, as usual, is normal with zero mean. As in 23.16 it will be found that the residual 
term in Table 23.5 has expectation {p — 1) (g — 1) (r — 1) and hence continues to provide 
an unbiassed estimator of v. The quotients between classes are affected like tliose in 
equations (23.23) to (23.25) ; but the interaction terms also provide estimators of ?; with 
the appropriate degrees of freedom. For instance, 

~ — ^.k. + J = Oj. + 6^ -f c, + Cjk. — i^j + + c, + Cj..) 

— (a, -f + c, + C...) 

= ^ik. - Cj.. - C.k. + C... (23.28) 

so that the expectation of the sum of squares of the ic-terms is that of the C-terms, which, 
we know to be {p — 1) {q — 1) v. 

23.21. This brings up a new point arising for the first time in the three-way classi- 
fication. If (23.27) is true, the analysis of variance will provide four different estimators 
of the variance v, namely the interactions AB, BC and CA and the residual. Iliese are 
independent (for they depend only on the C’s, and the theory appropriate to the case of 
homogeneity continues to apply) and their ratios may be tested in the ^-distribution. If 
these ratios are such as can have arisen from random sampling we may accept the hypothesis 
represented by (23.27) ; if not we must reject it. In short, the interaction quotients pro- 
vide a test of the hypothesis (23.27). In the two-way classification no such test is available. 

Interactions 

23.22. On the hypothesis (23.27) the interaction quotients of type AB give unbiassed 
estimators of the variance v. If in any particular case these quotients differ significantly 
among themselves or from any other independent estimator of v, we have to reject the 
hypothesis. Apart from the normality of the variation of which is not for the moment 
in question, this means that we cannot represent the data as the sum of separate effects 
due to A-, B- and (7-classes, together with a residual C which is the same in form for all 


Sum of Squares. 

d.f. 

Between 4 -classes . 
Between B-classes . 
Between O-classes . 

B (xj,, - 

- «...)® 

^(os.d - a:...)® 

S {xjk. — Xj., - x.k. + 

p - 1 

q - 1 
/• - 1 

Interaction AB . 

{p -l)iq- 1) 

Interaction BC . 

^ (x.M — x.k. - X..I A X...Y 

(g — 1) (r — 1) 

Interaction CA . 

B [xj.i - Xj.. - X..I -t- X...V 

(r -l)ip - 1) 

Residual .... 

B {xjki — Xj.. - x.k. —X..I A Xjk. 
+ «:.jci A Xj.i — X...V 

(p - 1) (q - 1) (r - 1) 

Totals . 

B {xjki - x...)^ 

pqr — 1 
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subclasses. The effects of the classes are entangled — or, as we may say, they interact 
This is the origin of the term “ interaction 

Suppose, for instance, our data are crop-yields, and membership of the three classes 
corresponds to applications of three manures, nitrogen (A), potash {B) and phosphate (G). 
The hypothesis represented by (23.27) would then be equivalent to supposing that all three 
manures exerted an effect on yields, but that they did so independently. A given dressing 
of nitrogen would increase the yield by %, whatever dressings of the other fertilisers were 
applied. But it might happen that the response in yield to aj varied according to how 
much of the others were present — potash might either stimulate the effect of nitrogen or 
inhibit it. If this were so, the fertilisers would interact and the hypothesis (23.27) would 
break down. Significant departures from homogeneity in the interaction terms usually 
lead us to search for possible entanglements of this kind. 

23.23. It must not be overlooked, however, that significant interactions do not 
necessarily imply interaction in any real sense. They may arise from heterogeneity in 
the data. To return to our example of crop-yields, suppose the yields were taken from 
a series of plots which differed materially in natural fertility. It might very well be found 
that the hypothesis (23.27) could not be justified even if the differences in yields due to 
the natural effect were partially absorbed into the coefficients a, h and c. If by chance 
the heavier dressings of fertilisers were applied to plots of greater fertility, the hypothesis 
might be shown as failing and “ significant ” interactions appear. Such points as this 
require careful consideration in the interpretation of significance, and we shall illustrate 
them in some examples below. 


23.24. Interactions of type AB, involving two classes, are said to be of the first 
order. When considering the general ^^-way classification we shall see that there can 
appear interactions of second, third, fourth . . . order. In fact, the residual in Table 23.5 
is formally equivalent to an interaction of the second order, of type ABC, just as the first- 
order interaction is equivalent to the residual in the two-way analysis of Table 23.2. 

To complete the definitions, we may define the sum of squares between ^.-classes as 
an interaction of order zero. The seven constituent items in Table 23.5 would then 
correspond to the following : — 


Interaction. 


d.f. 


()rd(>r zero 

Order 1 
Order 2 


A 

; V ^ ^ 

li 

q — 1 

a 

r — 1 

AB 

; (p_l)(g_l) 

BO 

(. 7 -I)(r-l) 

CA 

(r ~ 1) (p - 1) 

ABG 

(p - \){q -l){r - 


This illustrates the general symmetry of the analysis and suggests obvious generalisa- 
tions. 


n-way Classifications 

23.25. For instance, with five classes A, B, C, D and E we may analyse the total 
sums of squares into 2® — 1 = 31 components. There will be ^ = 5 interactions of 
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order zero ; (!) = - interactions of first order, type A£ ; ^ 3 ^ ~ interactions of 


second order, type ABC ; = 5 interactions of third order, type ABCD ; and one 

residual or interaction of fourth order, type ABODE. The interactions of zero, first and 
second order are of a type already familiar : — 

^ ^ 

^ — iK,. — X X 

\ ' •••««/ 

^ ^ (Xjki... — — x.ki., - + a;..;. - a: y- . (23.29) 

The third-order interactions are typified by 

^ i^jklm. ^jkl.. ^.klm. ^jk.m. ~l~ ^j.2.. "h 

+ ^.kl.. + ^.k.m. + — ^.k... — ^..U. — ^ ) ^ ■ (23.30) 

and the reader will be able to write down the residual for himself. 

As usual, the 31 terms all furnish independent estimators of the variance on the 
hypothesis of homogeneity, and if this is rejected we may consider the alternative 
represented by 


^jklni'n, ~1~ ^k "t" Cjklni'ib * • • .(23.31) 

The complete analysis in such cases may become very complex, but frequently it is sufficient 
to consider only sums of squares suggested for investigation by prior expectations. 


Example 23.3 

The following data show the percentage water-content in a number of samples of 
a commercial product. Six samples were chosen ; each sample was tested by four different 
operators ; and each operator carried out the determination by three different methods. 
We have thus a 6 x 4 x 3 classification. 


TABLE 23.6 

Percentage Water-Content of Six Samples determined by Four Operators using Three 

Methods. 








Operators, 






Samples. 


1 



2 



3 

1 


4 



Tests. 



Tests. 


Tests. 


Tests. 



1 

2 

3 

1 

2 

3 

1 

2 

3 

1 

2 

3 

1 

59 

61 

61 

57 

60 

58 

55 

58 

62 

54 

56 

59 

2 

57 

58 

60 

57 

58 

58 

61 

60 

57 

60 

56 

58 

3 

55 

57 

59 

55 

55 

56 

54 

52 

58 

53 

55 ! 

55 

4 

60 

57 

58 

56 

57 

57 

54 

58 

55 

61 

59 ' 

58 

5 

61 

61 

60 

59 

58 

59 

61 

57 

60 

62 

60 

60 

6 

63 

59 

60 

62 

63 

61 

64 

62 

59 

59 

60 

61 
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We will first of all analyse the variance systematically with rather more arithmetical 
detail than is usually required, in order to illustrate the process. 

A great deal of work is saved if we take a mean at 60. The table then becomes — 


TABLE 23.7 










Operators. 








Samples. 



1 




2 




3 




4 


Totals 


Tests. 



Tessts. 



Tests. 



Tests. 



1 

2 

3 

Totals 

1 

2 

3 

Totals 

1 

2 

3 

Totals 

1 

2 

3 

Totals 


1 


1 

1 

1 


0 

— 2 

-5 

~5 

— 2 

2 

-5 

-6 

-4 

-1 

-11 

-20 

2 

-*3 

— 2 

0 


~3 

-2 

_-.2 

-7 

1 

0 

-3 

2 

0 

-4 

-2 

-6 

-20 

3 


-^3 

-1 

-9 

-5 

^5 

-4 

-14 

-6 

-8 

— 2 

-16 

-7 

-5 

-5 

-17 

-56 

4 

0 

-3 

— 2 

^r> 

-4 

--3 

-3 

-10 

Co 

^2 

-5 

-13 

1 

-1 

-2 


-30 

5 

1 

1 

0 

0 


— 2 

-1 

-4 

1 

-3 

0 

— 2 

2^ 

0 

0 

2 

-2 

6 

3 

-1 

0 

2 

2 

3 

1 

(J 

4 

2 

-1 

5 

-1 

0 

1 

0 

13 

Totals 


7 

2 

^ 14 

--14 

-9 

-11 

-34 

-11 

- 13 

~9 

-33 

-11 

-14 

-^9 

-34 

-115 


We have shown the totals of the tests for each operator, of the tests for all operators, and 
of samples for each test. 

We now form three two-way tables from this by adding the values of one of the 
variates, e.g. — 


Samples. 


TABLE 23.8 
Operators. 



1 

2 

3 

4 

Totals. 

1 

1 

- 5 

- 6 

- 11 

- 20 

2 

- 5 

_ 7 

- 2 

- 6 

- 20 

3 

- 9 

- 14 

- 16 

- 17 

- 56 

4 

- 5 

- 10 

- 13 

- 2 

- 30 

5 

2 

_ 4 

- 2 

2 

- 2 

6 

2 

6 

5 

0 

13 

Totals 

- 14 

- 34 

- 33 

- 34 

- 115 
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Samples. 


TABLE 23.9 
Tests. 



1 

2 

3 

Totals. 

■ 

1 

- 15 

— 5 

0 

- 20 

2 

- 5 

- 8 

- 7 

- 20 

3 

- 23 

- 21 

- 12 

- 56 

4 

- 9 

- 9 

_ 12 

- 30 

5 

3 

- 4 

- 1 

- 2 

6 

8 

4 

1 

13 

Totals 

- 41 

- 43 

- 31 

- 115 


TABLE 23.10 

Operators. 



1 

2 

3 



4 

1 

Totals, j 

r 

— 5 

- 14 

- 11 

- 11 

- 41 

2 

- 7 

- 9 

- 13 

- 14 

- 43 j 

3 

- 2 

- 11 

- 9 

- 9 

- 31 

Totals 

- 14 

— 34 

- 33 

- 34 

1 

- 115 

1 


As we have inserted the totals of various kinds in Table 23.7 these subsidiary tables 
ean be picked out at once ; but in general, totals are not available in the original (and for 
four -way classifications it is difficult to find a form of tabular presentation which will permit 
of their insertion) so that the tables have to be separately compiled. In practice I find it 
convenient to do so in any case to avoid picking out the wrong figures in the original table. 

Pursuing the condensation process, we should now derive three one-way tables from 
Tables 23.8 to 23.10, but in fact the row and column totals already give us what is retpiired 
(and incidentally provide a check on the arithmetic). 

Now we proceed to find the various sums of squares. For the total of all observations 
we find 115, and for the sum of squares of observations 653. Thus 

— 115 

^... = = - 1-597,222 

= 183-680,556 

^ i^jki ^. ..)^ = Y 

= 653 - 183-680,556 

= 469-319,444 ..... (23.32) 

with 6x4x3-1=71 degrees of freedom. 



n-WAY CLASSIFICATIONS 


193 


For the interactions of order zero we require the sums of type 

where summation takes place over the N values. It is, however, unnecessary to work out 
the means Xj_. Consider, for example, the sum of squares between samples. From the 
totals of Table 22.8 or Table 22.9 we find {j denoting samples) — 

ZiUxj^y = (- 20)2 + (- 20)2 ^ 132 

= 5009, 

where the summation is over six values only. Thus, for summation over the 72 values — 

T 2 

lixjJ^ = 5009 = 417-416,667. 


Hence 


~ = 417-416,667 - 183-680,556 


= 233-736,111 

with 6 1 = 5 d.f. 

Similarly {k denoting operators) we find — 


(23.33) 


^ (x.k. - a:...)‘ 


with 3 d.f. ; and {I denoting tests) — 

Z («., - ;r...)2 


3597 


183-680,556 


16-152,778 


(23.34) 


4491 


183-680,556 


— .1-444,444 ..... (23.35) 

with two degrees of freedom. 

Now we require first-order interactions. We have (summation being over the N 
values) — 

— -'c.fc. + (%:. — ^ (Xj.. — .r ..)‘^ 




2 = 2’(x,,, -^...)2 +2’(.T,.. - .r..)2 

+ N (x fc. - .X ..)2 _ 22 * (Xj/,. - X...) (X;.. - X,..) 

- i^jk. - i'x.k. - ^...) 


= 2 (xjj^_ ^...)" •L (•^j.. ~ ^...)" ^ i^.k. ^...)^ (23.36) 

and thus the first-order interaction term is ascertainable from 2 (x^/^, )2 and quantities which 
have already been computed. 

From the body of Table 23.8 (remembering that summation relates to 72 values and 
hence that each value in the table is counted 3 times) we find 


{r^ + (-r>)2 


1499 


499-666,667. 


The interaction term is then 


499-666,667 - 183-680,556 - 233-736,111 - 16-1.52,778 = 66-097,222 . (23.37) 
with (6 — 1) (4 — 1) = 15 d.f. 

Similarly in the body of Table 23.9 we find for the sum of squares 1915. Hence the 
interaction of samples and tests is 


1915 


183-680,556 - 233-736,111 - 3-444,444 = 57-888,889: 


. (23.38) 


A.S. — VOL. II. 


o 
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In the body of Table 23.10 the sum of squares is 1245. Hence the interaction of tests 
and operators is 

- 183-680,556 - 16-152,778 - 3-444,444 = 4-222,222. . (23.39) 

6 

Finally, the residual is given by the difference of the total sum of squares and the 
interactions already found, namely by 

469-319,444 ~ 233-736,111 - 16-152,778 - 3-444,444 - 66-097,222 - 57-888,889 

- 4-222,222 = 87-777,778 . . . (23.40) 

with (6 — 1) (4 -- 1) (3 — 1) = 30 degrees of freedom. 

We can now make up the table of variance analysis as follows : — 

TABLE 23.11 

Analysis of Variance of Data of Table 23.7. 


Sum. of Squares. 

d.f. 

Quotient. 

Between samples {S) ... 

233-736 

5 

46-747 

„ operators (0) . 

16-153 

3 

5-384 

„ tests {T) .... 

3-444 

2 

1-722 

Interaction SO 

66-097 

15 

4-406 

„ OT 

4-222 

6 

0-704 

„ ST 

57-889 

10 

5-789 

Residual 

87-778 

30 

2-926 

Totaxs 

469-319 

71 



We proceed to discuss the data in the light of this analysis. 

The most striking feature of the table is the size of the quotient between samples. 


The variance ratio here is 


46-747 

2-926 


= 15-976, with a corresponding value of z equal to 1-38. 


For Vt = 5, Vz = 30 the 0-1-per-cent, point is 0-8554, and the ratio is highly significant. 

We remark in passing on a point which will be taken up later. The ordinary z-test 
gives the probabilities that the ratio of two variances chosen at random does not exceed 
a given value. But in this case we have deliberately picked out the largest quotient for 
one of our estimates. If z had fallen at the 5-per-cent, level we could not have argued that 
the odds were 19 to 1 against the event. They are very much less, since we have deliber- 
ately chosen the largest value for comparison with the residual. However, in the present 
case our probability is so small that we can confidently assume the significance of z (see 
23.27 below). 

Our first inference, then, is that the whole sample is not homogeneous. There appear 
to be variations from sample to sample which are not assignable to differences between 
tests or operators, and if we wished to standardise our product with greater accuracy we 
should be led to examine the manufacturing process. This conclusion is, however, subject 
to a point which we discuss in the next example. 

Having rejected the hypothesis of homogeneity we are now faced with the question 
whether the other quotients in Table 23.11 can be compared so as to assess the relative 
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variability of the other factors. We must then take a new hypothesis, and we will suppose 
that the variable may be written 

^jki ~ % ~l“ ^jjcu • . . . . (23.41^ 


where is an unknown quantity expressing the accepted variation between samples. 
Unless there is something very peculiar about the tests or operators it is reasonable to 
suppose that the variation between samples can be isolated in this way. We will now 
suppose that the ^’s, not the a;’s, are distributed normally with common mean and variance v. 

If the values given by (23.41) are substituted in the various constituent items of Table 
23.5, it will be found that except for the variation between samples all the other sums of 
squares assume the same form with ^ written instead of x. This, of course, follows from 
23.20 of which our present hypothesis is a particular case. On the hypothesis of (23.41) 
we are thus enabled to compare the quotients in the table in the usual way. The element 
of variation between samples has, so to speak, been abstracted from the discussion. 

We then turn to the sum of squares between operators in Table 23.11. The variance 
5-384 

ratio is 2 ^^ ~ ’’1 “ ’'2 “ significant. Similarly, for the sum 


of squares between tests we find a ratio of 


1- 722 

2- 926’ 


again not significant. 


Provisionally we 


conclude that there is no evidence of variation between operators and tests, apart from 
pure sampling effects. 

Now we have to consider the interactions. For that of SO we have the variance ratio 


4-406 


2-926 


- = 1-51, which is not significant. 


We find the same for the interaction ST. 


For 


OT we have (taking the larger variance as the numerator) 


, , 2-926 

= 1 log, 


0-713, = 30, 


6 . 


This value is just beyond the 5 per cent, point and, judged by itself, might have been regarded 
as significant ; but taken in conjunction with the others it may, perhaps, be accepted as 
a permissible sampling fluctuation. 

To sum up, therefore, the only evidence of deviation from homogeneity appears in the 
sample-differences, and we see no reason to reject the hypothesis represented by (23.41). 
Since all the other items in the analysis, apart from that between samples, are homo- 
geneoTis, we could condense the table into the form — 


Sum of Squares. 

d.f. 

Quotient. 

Between samples .... 

233-736 

5 

46-747 

Remainder 

235-583 

66 

3-569 

Totals 

469-319 

71 



The reader may wonder why, in carrying out the tests of significance, we have through- 
out used the residual quotient as the denominator of the variance ratio, and not, for instance, 
one of the interactions. There are two reasons. First, the residual has more degrees of 
freedom, so that it is preferable notwithstanding that the 2 -test is valid for any number 
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of degrees of freedom. Second, the residual is not so likely to be affected by interactionK 
which, though not emerging into significance, might nevertheless exist. But onco we have 
estabhshed that an interaction is not significant, there is no reason why it should not be 
amalgamated with the residual, as in the table on page 195. 

Example 23.4 

There is a point of great importance concerning the inference from analyses of variance, 
which we will illustrate by an imaginary example based on the data we have just con- 
sidered. Suppose our analysis of variance were of the following form : — 


Slim of Squares. 

d.f. 

1 

Between samples .... 

125 

5 

Between operators .... 

60 

3 

Interaction SO 

150 

15 

Remainder 

48 

48 

Totals 

383 

71 


Quotient. 


25 

20 

10 

1 


We will suppose that the sums of squares between tests and the other first-order inter- 
actions are not significant, so that they can be amalgamated with the residual to givfi a 
remainder with 48 degrees of freedom as shown. 

On this evidence the sums of squares between samples and between tests arc both 
significant, as also is the interaction SO. What inference can be drawn about tlu'i varia- 
bility of the product from one sample to another ? We know that the readings differ 
significantly ; but may not this difference itself be due to the demonstratctl variation 
between operators, or does it really exist ? Is there in fact any variability in the water- 
content of the product, apart from the sampling effect in homogeneous variation ? 

The significance of the SO interaction means that we cannot now regard the elf(Mds 
of operator and sample as independent. We must consider the possibility of entangkunent.. 
This is not the only explanation there may be some other specific cause of variation 
present which we have not thought of, and on which our present data throw no light. Bid. 
in this case there is sonie prior possibility that samples and operators are “ entangled ” or- 
interacting m the ordinary sense. An operator may be getting better results from his 
material when it has high water-content than in the reverse case ; or, knowing that the 
mean content is near 60 per cent, he may unconsciously (or even consciously) bring ids 
determinations nearer to that figure and hence reduce their spread. 

In a case of this kind, and indeed in all statistical inquiries, it is important to have 
a clear idea of the question which is being asked and of the population to which it relates 
We have had a number of samples and have tested them, by four operators each using 
three tests. So far as we can see, the tests are equivalent but the operators are not. All 
the same, we are not very interested in the variation among operators (unless this is 
an experiment in psychology and not in chemistry). What we want to know is whether 

determmations by different operators. Our particular four are themselves samples of 
a population of operators. • 
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If we confine our attention to the four operators and suppose that each has a specific 
reaction to particular samples so that 

^jk ~ '^jk ijk ..... (23.42) 

where | is a normal random residual with variance v for all j, k, then in the usual 
way we find 

JE Y {xj„ - xj^ - + x^y =.(p-~l){q-l)v+Y {m^j, - m,-. ~ m.,, + m..)^ . (23.43) 

But suppose we consider the matter from a different viewpoint. Regard as itself 
chosen at random from a normal population of operators with variance v'. Then, taking 
expectations of this population in addition, we find from (23.43) 

E Y {x.j„ - Xj^ - + x_y = {p - 1) {q - 1) {v -i- v'). . . (23.44) 

Thus the interaction term provides an unbiassed estimator of the variance v v' of Xjj^. 
By “ unbiassed ” in this connection we mean that the average over all determinations and 
all operators will give the variance of Xjj. in the population of all determinations and all 
operators. 

Similarly we shall have, on the same interpretation, 

EY{xj, - x,y ^{p-l){v+ v')\ (2 g. 

EY{x,,~x,y^{q-l){v-{-v'){ • ■ • • ^ ^ 

and hence the ratio of either interaction of zero order to the first-order interaction may be 
tested for homogeneity. Our analysis then becomes — 


Sum of Sqxiares. 

d.f. 

lietweon samples .... 

125 

5 

Bctwoon operate I's .... 

60 

3 

Rosidua.1 (SO) 

150 

15 

Totals 

335 

23 


Quotient. 


25 

20 

10 


Neither i-atio is now significant. For the sum of squares between samples we have 
a ratio of 2-r), = 5, v. = 15, which is below the 5 per cent, point. 

Thus we should conclude tliat, regarding the data as a member of possible samples from 
all possible operators, there is little or no evidence of real variation from sample to sample. 
This is quite consistent with the inference we drew at the beginning of the example as to 
the “ significance ” of the terms concerned, though at first sight it appears directly 
contradictory. In the first case we inferred that for these four operators there were signifi- 
cant differences in their determinations for the samples, so that sample-differences are 
real ” in the sense that they cannot be attributed solely to random variation in homo- 
geneous material. In the second case we enlarge the domain by considering operators as 
subject to “ error ” in the sense that one human being differs from another, and find that 
sample-differences can now be ascribed to variation in the population of operators. 

No further emphasis is needed on the care necessary for the proper interpretation of 
the results of an analysis of variance. The nature of the population which is being con- 
sidered should be brought explicitly to mind in every case ; and the reader should form 
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the habit of asking himself, whenever a result is found to be “ significant ” : significant 
of what ? 


Arithmetic of Variance Analysis 

23.26. Before considering further examples we will dispose of a few points arising 
from the calculation of the constituent sums of squares and the application of the s-test 
in determining the significance of variance-ratios. 

The calculation of sums of squares for an 7^-way classification can very conveniently 
be carried out by the use of a punched-card system when the data are numerous, and some 
remarkable computing feats have been performed by this technique. For ordinary labora- 
tory work with a machine, the process of Example 23.3 is possibly the best, though some 
modifications may be made to suit individual taste. 

The main work lies in computing the total sum of squares. This is done by finding 
the sum of squares of observations from the original data (with a convenient working 
mean) and the sum of observations obtained at the same time. The formula 

2: (Xfa - x.y = £ xfa - Nxl, ■ . 

= — .... (23.46) 

then gives the total sum required. The quantity is constantly needed and should 

be recorded. It is useful to preserve a few more decimal places than will ultimately be 
used in the final presentation of the analysis. 

The original data are then condensed into n {n — l)-way tables by summing over 
each class in turn. In Example 23.3 this was done so as to give three tables : Operators- 
Samples, Tests-Samples and Operators-Tests. The main body of these tables gives means 

of the type multiplied by a constant factor. A further condensation will give 

sets of means of type ; and so on, as far as is required. 

From the condensed tables we can then determine the sums of squares of means of 
various orders, and hence the interactions. The main pitfall lies in the way of the applica- 
tion of the correct multipliers and divisors — it has to be borne in mind that the summation 
takes place over all values of the sample. 

Suppose, for example, we have a four- way classification into classes with p, q, r and s 
numbers of members. The first condensation gives us four tables of which a typical one 
is p X g X r, based on the sum of s members. The next condensation gives us six two-way 
tables typified by p x g, based on the sum of rs members. The third gives us four one- 
way tables such as p, based on qrs members. Consider the variance between /^-classes : — 

^{^0... .... (23.47) 

In the condensed one-way table of p classes each term is to be counted qrs times, and 
thus, if S is the sum of squares in this table as it stands. 
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For the first-order interaction we have 




'jk,^ 


Xa 


— 4 - 

= ^ (%*.. - ^ - a;... _ 2: _ a; ^ 

The last two terms on the right have already been found. We require 


(23.49) 


^i&.. — ^ ~ .... (23.50) 

If 8' is the sum of squares of elements in the body of the two-way table found by adding 
r- and s-items, we find 

S' 

. (23.51) 




rs 


and so on. The general process will now be clear. 

Unfortunately there is no convenient independent check on the calculations. The 
various condensed tables are self-checking since their totals are the sum of all observations, 
but the sums of squares do not check with anything. It is, of course, possible to evaluate 
each individual term in the residual and to check by summing squares, but this is too 
laborious for use except in the simplest cases. 


Use of the z-test for Several Variance-ratios 

2Z.27 . In the complete analysis of n classes there are 2^ — 1 elements, and the 
number of variance ratios arising for test may be considerable. The 2 -test gives the proba- 
bility that a particular value chosen at random will be exceeded. If therefore we pick 
out the largest ratios for test, the chance that one of them is “ significant ” in the sense 
of exceeding the lOOP-per-cent. point is a good deal greater than P, and we run into the 
danger of attributing significance to what may be a pure sampling effect. 

Suppose we make r different and independent tests of r values of z. The chance that 
each does not exceed a fixed value (depending oii the number of degrees of freedom) is 
1 — P, where P is some assigned level of significance. Hence the chance that none of 
them exceeds its appropriate value is 

(1 — Py = 1 — rP, approximately, . . . (23.52) 

provided that P and rP are small. For instance, if P — 0-01 and r = 7 the probability 
that no z exceeds its appropriate significance value is 0-93, and thus there is a probability 
of 0-07 that at least one of them will do so. 

In practice the problem of numerous comparisons is more complicated because they 
are not independent. In such circumstances our judgment of significance has to incor- 
porate an element of the intuitive. However, if all the comparisons are based on the 
common residual quotient it is possible to find the probabilities that the largest of r values 
exceeds assigned values. The resulting expressions are complicated, even when all the 
sums of squares have the same degrees of freedom, but reference may be made to Hartley 
(1938) for approximations and to Cochran (1941) and Finney (1941a) for exact expressions. 
The conclusion reached by Finney is that if the degrees of freedom in the residual are 
sufficiently numerous the ratios may be treated as completely independent. 


23,28. There is a particular case of the n-way classification which is worth special 
mention, namely, that for which each classification is a simple dichotomy, so that there 
are 2^ subgroups. This case arises frequently when so-called “ factorial ” experiments 
are being conducted to determine the effect of a treatment which is either applied or with- 
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held. The analysis of variance remains the same in principle, but of course the arithmetic 
becomes a good deal simpler. 


Example 2S.5 {E. Yates, Supp. J.R.S.S., 1935, 2, 181) 

An area of ground was sown with peas and divided into 24 plots in the manner shown 
in Table 23.12. The plots received, or did not receive, dressings of nitrogen (N), phosphate 
(P) and potash {K) in the manner shown, the yields in pounds being given in the table. 


TABLE 23.12 


Yields of Peas and Manurial Treatments on 24 Plots 


PK 


N 

K 

49-5 

46-8 

62-0 

45-5 

NP 

NK 

NPK 

P 

62-8 

57-0 

48-8 

44-2 

N 

K 

NP 

NK 

59-8 

55-5 

52-0 

49-8 

NPK 

P 


PK 

58-5 

56-0 

51-5 

48-8 

P 

N 

NK 

PK 

62-8 

69-5 

67-2 

53-2 

NPK 

K 

NP 


55-8 

55-0 

t 

59-0 

56-0 


There is some purpose here in the alternation of treatments, but that need not concern us 
for the present. We have 24 observations in four classes, viz. blocks (3), nitrogen (2), 
phosphate (2) and potash (2), giving 3x2x2x2 = 24 records. 

Condensing ^ the table by adding blocks we get the following : — 

No treatment N P K NP NK PK NPK I^otal 

154-3 191-3 163-0 156-0 173-8 164-0 151-5 163-1 1317-0 


Condensing according to the three treatments we have — 



N 

not.-A^ 

Totals 

P 

336-9 

314-5 

651-4 

not-P 

355-3 

310-3 

665-6 

Totals 

692-2 

624-8 

1317-0 



K 

not-/v 

Totals 

P 

314-6 

336-S 

651-4 

not-P 

320-0 

345-6 

665-6 

Totals 

634-6 

682-4 

1317-0 
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N 

not-N 

Totals 

K 

327-1 

307-5 

634-6 

i\ot-K 

3()5-l 

317-3 

’l 

682-4 ! 

i 

'POTALS 

692-2 

624-8 

1317-0 


VV<‘ oivut th(‘ nMiiaining calculations. The analysis in its final form is given in 

blc 2:1. i:h 


TABLE 23.13 

Afialijsis of Variance of the Data of Table 23.12 


SuTus of Squares. 


d.f. 

Quotient. 

blocks (Ji) , . . . 


177-803 

2 

88-90 

.V 


189-282 

1 

189-28 

/’ 


8-402 

1 

8-40 

K 


9, 7 -202 

1 

95-20 

Interaction JiX 


94-255 

2 

.47-13 

itr 


2-260 

2 

1-13 

UK 


23-685 

2 

11-84 

A7* 


21-281 

1 

21-28 

.\7C 


33-134 

1 

33-13 

/7C 


0-481 

1 

0-48 

lixr .... 


25-302 

2 

12-65 

IL\'K .... 


36-004 

2 

18-00 

fiPK .... 


3-782 

2 

1-89 

.\7>/v .... 


37-003 

1 

37-00 

lO'Midunl (liXrK) .... 


I2H-489 

2 

64-24 

Tot ACS 


876-365 

23 



\V(' !ia,v(‘ carried out t,h<‘ analysis in full so as to illustrate the arithmetical process 
' a four \va\’ (‘lassilica.tioii, hut* we may note at once tliat it is unduly elaborate. There 
‘ onlv 2-1 ohsfu’vat ions in tlu^ data, and we cannot expect them to provide all the answers 
t.lu' (juf'stions whi(‘h wc' (‘ould ti'ame as to the significance of the various constituent 
ins in tlu' analysis, 'rids is borne out liy the s-test. The residual variance is 64*24 
th two d<'gr<*(‘s ol lr(‘(*<loin. For I'l "■ 1, r-j ■- 2 the variance ratio at the 1-pei-cent. 
int is fis-li) and that for r, * 2, v. -- 2 at the same point is 99-00. Only values greater 
an about* 100 tiiim's ()-l*24 oi‘ l(*ss than 1/ 100th of that value would thus be significant, 
ily t.h(‘ int*(*ra,(d'ion J*K {a.lls outside this range, and even this, among so many, can hardly 
rc-gardiMl as signitii^a.nt. 

'riu* impiiry is not, however, completely frustrated. Since the second-order inter- 
tions an* not significa,nt, we amalgamate them with the residual to give a remainder 
m of Hipiares of 230-5K0 with nine d.f. and a cpiotient of 25-62. It will now be found 
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that among the first-order interactions only two are significant, PK and BP being too 
small. Had they been too large we might have attributed some genuine significance to 
this result, but it is not very plausible to suppose that there is a “ real ” interaction between 
blocks and phosphate, or that phosphate and potash inhibit each other’s action. The 
differences from expectation are more probably due to individual soil variation from plot 
to plot. 

If we accept the first-order interactions as not significant, we may amalgamate them 
with the remainder to give the following i-r- 


Sum of Squares. 

d.f. 

Quotient. 

Blocks 

177-803 

2 

88-90 

N 

189-282 

1 

189-28 

P 

8-402 

1 

8-40 

K . 

95-202 

1 

95-20 

Remainder 

405-676 

18 

22-54 

Totals 

876-365 

23 



Here the P-quotient is not significant, but the variance ratio for blocks, 3-99, is near the 
5-per-cent, point. The A-quotient will be found to be significant at the 1-per-cent, point, 
the i?-quotient near to the 5-per-cent, point. Our conclusion is that there is strong indica- 
tion that nitrogen influenced the yield, some indication that potash did so, and little indica- 
tion that phosphates did so ; and that there is ground for suspecting heterogeneity in the 
soil partly because of the difference between blocks and partly from some of the first-order 
interactions. 

In this case, of course, we knew already more or less what was to be expected of these 
data and are the readier to accept the conclusions on that account. Had we known nothing 
of the effect of fertilisers on leguminous crops our conclusions on such slender evidence 
must have been very tentative indeed, particularly if we wished to extend them to peas 
grown on other soils under different climatic conditions with different amounts of fertiliser. 

Example 23.6 (C. E. Gould and W. M. Hampton, Supp. J.E.S.iS., 1936, 3, 137) 

In the manufacture of optical glass there appear small bubbles known as “seed”, 
which constitute a defect. The glass is made in “ pots ” which take about a year to pre- 
pare, and are run continuously over long periods when once started. There are two pots 
to a furnace and materials are introduced into a pot from time to time which, after fusion, 
provide a “ run ” of glass. Each run provides several days’ work, one day’s work being 
known as a “ journey ”. At each journey quantities of glass are drawn from the pot and 
blown into “ cylinders ”, there being about 18 or 20 to the journey. For the purposes of 
the experiment three cylinders were chosen, the third, tenth and sixteenth, and pieces of 
regular size cut from them for examination as to frequency of seed. The first five journeys 
of each of five runs were sampled. 

We have here a four-way classification 2 (pots) x 5 (runs per pot) x 5 (journeys per 
run per pot) x 3 (cylinders per journey per run per pot). The actual dates of the runs 
were February 16th, May 23rd, June 12th, September 1st and December 6th, so that the 
manufacturing period covered about ten months. We shall assume that the glass was 
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of the same type throughout, although in actual fact it was different in one or two cases 
— but not sufficiently different to affect the analysis. 

The topic of main interest here is whether the frequency of seed varies significantly 
according to the four factors concerned. If so, the alteration of manufacturing conditions 
may improve the wastage due to seed ; but if not — and the variation is the kind of thing 
which can be accounted for as chance fluctuation in sampling from a homogeneous popula- 
tion — there is little hope of improvement except perhaps by a radical alteration in the 
process affecting all pots, runs and journeys alike. 


TABLE 23.14 

Frequency of “ Seed ” in Samples of Glass 




Pot 1. 


Pot 2. 


Cyl. 1. 

Cyl. 2. 

Cyl. 3. 

Cyl. 1. 

Cyl. 2. 

Cyl. 3. 


J 1 ... 

47 

56 

100 

52 

61 

88 


2 , 

55 

89 

93 

49 

62 

97 

Run 1< 

3 . . . 

35 

57 

56 

34 

60 

72 


4 . . . 

78 

67 

113 

47 

93 

118 

1 

• 

33 

40 

128 

16 

29 

130 


1. . • • 

52 

66 

36 

65 

80 

40 


2 . . . 

21 

61 

49 

122 

97 

79 

Run 

3 . . . 

31 

39 

25 

45 

54 

72 


4 . ■ • 

43 

72 

52 

109 

120 

80 


« ,) k • • 

37 

51 

67 

67 

85 

63 


r/ 1 ... 

50 

61 

60 

75 

139 

130 


2 

33 

27 

49 

46 

58 

63 

Run 

3 . . . 

24 

39 

24 

15 

33 

39 


4 . • « 

18 

18 

43 

22 

16 

19 


r> . , , 

28 

42 

28 

27 

19 

22 


rJ 1 . . . 

24 

34 

43 

46 

66 

24 


2 . . . 

24 

49 

42 

40 

117 

105 

Run 4< 

3 . . . 

21 

21 

51 

30 

28 

34 


4 . . . 

21 

69 

48 

36 

64 

53 


. t.) ... 

70 

48 

42 

39 

60 

78 


f J 1 . . . 

31 

54 

40 

19 

93 

36 


2 . . . 

34 

24 

46 

16 

12 

2 

Run 5< 

3 . ■ ■ 

120 

122 

120 

33 

58 

107 


4 . . . 

109 

119 

120 

25 

63 

90 


in... 

09 

49 

60 

34 

43 

30 


Before plunging into the analysis of variance it is as well to look over the data to see 
whether they themselves suggest any lines of inquiry. We observe considerable varia- 
bility from journey to journey within the same run, t73 and t/4 of run 5 being conspicuous 
in pot 1 ; and in run 1 the numbers of seed appear to increase from cylinder 1 to cylinder 3 
in a rather exceptional way. The runs themselves seem to differ materially. Prior con- 
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siderations also suggested an examination of the way in which frequency of seed varied 
between pots, since they were chosen so as to differ substantially in constitution. 

A complete analysis of variance of the data is as follows : — 


TABLE 23.15 

Analysis of Variance of the Data of Table 23.14. 


Sums of Squares. 

d.f. 

Quotient. 

\ 

Between pots (P) .... 

898 

1 

898 

„ runs (P) .... 

14,059 

4 

3,515 

„ journeys (J) . . . 

4,365 

4 

1,089 

„ cylinders (G) . 

10,631 

2 

5,315 

Interaction PR 

16,133 

4 

4,033 

„ PJ 

4,081 

4 

1,020 

„ PC 

587 

2 

293 

,, HJ 

46,934 

16 

2,871 

„ RO 

11,626 

8 

1,453 

» JO 

2,540 

8 

317 

„ PRJ 

9,711 

16 

607 

„ RJG 

12,472 

32 

390 

„ JGP 

1,656 

8 

207 

„ GPR 

1,862 

8 

i 233 

Residual (PRJC) .... 

8,110 

32 

253 

Totals 

144,655 

149 



The second-order interactions will be found non-significant, so we anialgaiuate with 
the residual, giving a sum of squares 33,811, d.f. 96, quotient 352. 

It then appears that of the &st-order interactions PR, RJ and RG are significant and 
PJ may be so. There is beginning to appear evidence of heterogeneity, and that of a rather 
complicated kind. It seems that pots are interacting with runs, runs witli joui'iiev’s and 
runs with cylinders. 

Taking 352 as the quotient, we find that except for P the zero-order interactions are 
significant. The five i2-means are 68-50, 62-67, 42-23, 47-77 and 59-27, so that tlie variation 
of runs is not a simple rise or fall, which could have been explained as a tinavelfect. The 
five J-means are 58-93, 55-37, 49-97, 64-83 and 51-33, again not a regulai- effect. The 
(7-means are 44-46, 59-68 and 64-12, which are significantly different. Inspection of the 
table suggests that the first run is the source of the trouble. 

With data as heterogeneous as these it is rather difficult to set up a plausible hypothesis 
to test. The interactions of first order suggest that no simple additive effects of the four 
factors will explain observation, and if these terms are used as denominators in tests of 
variance ratios the variation between classes appears on the whole non-significant on the 
usual hypotheses. The analysis, then, suggests several subjects for inquiry as concerns 
the homogeneity of the data, hut does not suggest any simple explanation of the observed 
figures. The reader may care to refer to the original paper for a more complete discussion 
of the subject. 
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23.29. Perhaps we may pause at this point to review progress. We have seen 
that for an w-way classification of the special type wherein each subclass contains a single 
member, the sum of squares of all observations about their mean can be exhibited as the 
sum of a number of such sums. On the hypothesis of normality and homogeneity each 
constituent sum of squares, on division by its appropriate number of degrees of freedom, 
gives an estimator of the parent variance, and each is distributed as independently of 
the others. The hypothesis of homogeneity can then be tested in Fisher’s 2-distribution, 
subject to the adoption of a conservative attitude where many tests are made on the same 
data. If the hypothesis is rejected we may replace it by a simple form in which the effects 
of the different classes are additive, provided that the interactions are not significant. 
The particular ratio chosen for a test depends on the hypothesis concerned, and it is import- 
ant to have a clear idea of the exact question to which an answer is sought. 

23.30. In the next chapter we shall consider the case when the numbers in different 
subclasses are not equal, discuss the additive hypothesis in more detail, examine the relation- 
ship of variance- and regression-analysis, and extend our results to the analysis of covariance. 
We conclude this chapter by an examination of the important question : what can be 
done with the analysis of variance when the variation is not normal ? 

N oji-normal Data 

23.31. The analysis of a sum of squares into its constituent sums can, of course, be 
undertaken in all circumstances, but the various quotients may not continue to provide 
unbiassed estimators of the parent variance if the population is not-normal. What is 
equally' sei-ious, the constituent sums of squares may not be distributed independently. 
Thus, when ])arent normality cannot be assumed, the quotients in the analysis table are 
no longcM’ e(pial within sampling limits and their ratio is distributed in unknown form ; and 
even if the form wore known it would probably depend on parent parameters and hence 
fail to [)rovide an exact test of significance. 

The: |)i'oblem has been considered in four ways : — 

{a) Sampling experiments have been undertaken to see how far moderate deviation 
frotT) normality affects the 2 -distribution ; 

(/>) Attempts have been made to find transformations of the variate to throw the 
parent distributions into forms with equal variances, at least approximately, 
before the analysis is applied ; 

(r) By introducing a randomising process into the data before they are collected, 
attemi)ts have been made to preserve the 2 -distribution as a close approximation 
— this amounts to a change in the nature of the inference, as we shall see below ; 

{(1) Tests have been found which can be applied to ranked data irrespective of the 
parent form — this approach is a j)articular case of (c), but seems to merit special 
mention. 

We proceed to consider these four possibilities. 

23.32. The arithmetic entailed by a single analysis of variance, even in simple cases, 
implies that an extensive sampling inquiry into the distribution of z in non-normal popula- 
tions would be a very formidable undertaking. E. S. Pearson (19.‘U6) has studied in some 
detail the case of a one-way classification with unequal numbers, when the distribution 
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of z becomes equivalent to that of the correlation ratio Six populations were chosen, 
characterised by the following values : — 

=0, = 2-50 (symmetrical platykurtic) ; 

, = 0, = “IT (symmetrical leptokurtic) ; 

= 0, — 7‘05 (symmetrical leptokurtic) ; 

/5i = 0-2, = 3-3 (skew, Type III) ; 

= 0-49, /^a = 3-72 (skew. Type III) ; 

= 0*99, |Sa = 3-83 (very skew, Type I, with abrupt start). 

The results suggested that for this range of y5i and the distribution of z is adequately 
represented by Fisher’s distribution, and that therefore the homogeneity test may be 
applied. The case when the variation changed from group to group was not considered. 
It was also concluded that “ it seems probable that the more elaborate forms of analysis 
of variance are also of fairly wide application ”. 

Some work by Eden and Yates (1933) is often referred to as experimental confirmation 
of the same kind, but in fact it was carried out with rather a different object, that of con- 
firming the z-test for data under randomisation (see below, 23.36). 


Variate Transformations 

23.33. Suppose I is a new variate i (x). Then approximately we shall have 

var^ = f^^ vara;. ..... (23.53) 
\dx j 

If now the parent variance of the cc-distribution is related in some known raannei- to the 
mean, say / (m) = v, we have 

As a further approximation, if x varies about m by small quantities we have 

var ^ ^ /(a;) (23.54) 


Now we wish | to have a constant variance, say X, and if this is so, 


or 



(23.55) 


Although this expression is arrived at by approximation we are entitled to liope that 
the variate ^ will have almost constant variance, and at any rate a more stable variance 
than X. , 

For instance, if the original variation is thought to be of the Poisson tyi)e we have 
/ [x) = X, and from (23.55) are led to consider the transformation 


r VA 


/Jnn 
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if we choose A to be Similarly, if the variation is of the binomial type with variance 
^ (1 — p) we have 


^ = f ^ dp 

. J (1 - V)} 


= sin ^ -s/x, ...... (23.57) 


on suitable choice of A. 


23.34. These transformations are designed to “ stabilise ” the variance. They do 
not necessarily bring the variate closer to normality, though in some cases they will do so 
— we have, for instance, seen that tends to normality quicker than x^ (12.7). The 
following values (Bartlett 1936dl) illustrate the way in which the square-root transformation 
stabilises the variance of a Poisson distribution : — 


Mean m. 

Variance of Poisson 
Variate ■\/x. 

Variance of Poisson 
Variate VI® + i')- 

0-0 

0-000 

0-000 

0-5 

0-310 

0-102 

1-0 

0-402 

0-160 

2-0 

0-390 

0-214 

3-0 

0-340 

0-232 

4-0 

0-30() 

0-240 

6-() 

0-276 

0-245 

9-0 

0-263 

0-247 

120 

0-259 

0-248 

15-() 

0-256 

0-248 


The term | in the third column was added by Bartlett on the analogy of a continuity 
correction. For m. > 3 the variance is evidently quite stable. 


23.35. If now, having stabilised the variance, we carry out an analysis in the ordinary 
way, our residual sums of squares divided by the appropriate degrees of freedom will con- 
tinue to be unbiassed estimates of the common variance v, even if there are differences 
between the means of the classes. Instead of assuming as part of the hypothesis that the 
different classes are distributed with the same variance, we have transformed the variate 
so that this shall be so, at least to a close approximation. Relying further on the result 
that the transformed variates approximate to normality, or that if they do not the differ- 
ence will not seriously vitiate the 2 -test, we may apply that test to the transformed data 
in the usual way. 


Example 23.7 (Bartlett, 1936cZ) 

Table 23.16 shows the number of wheat seeds out of 50 which failed to germinate in 
four repetitions of an experiment with different treatments. 
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TABLE 23.16 


Germination of Wheat Seeds 


Number of 
Experiment. 

Number of Treatment. 

Totals. 

1 

2 

3 

4 

5 

6 

7 

1 

10 

11 

8 

9 

7 

6 

9 

60 

■ 2 ' 

8 

10 

3 

7 

9 

3 

11 

51 

3 

5 

11 

2 

8 

10 

7 

11 

54 

51 

1 4 

1 

6 

4 

13 

7 

10 

10 

Totals 

24 

38 

17 

i 

37 

33 

26 

41 

216 


In point of fact, treatment 7 was a repetition of treatment 6, the others being different. 
The point of interest is whether the treatments exert any effect on germination. We shall 
not inquire into any differences between experiments (which appear to be negligible from 
the row totals) and shall accordingly consider this as a one-way clas.sification into seven 
olasses, four numbers to the class. 

The presumption is that in any given class the variation is of the binomial type. We 
might apply the sin^’V^ transformation, but will adopt instead an ad hoc s(piare-root 
transformation obtained as follows : — 

We have 

V = np {I — p). 

Suppose now that p — -{- d where <5 is small. Then 


V = n {po 6 — pI — 2pod) 

= u { (1 - 2po) {p - Po) + Po - pI) 

= Tip (1 - 2po) -f npl. 

If we now put 

{x k I') 

o 

where k = 1— and a; is the observed frequency, then ^ will tend to have constant 

1 zpfj 

variance. 

In our example the total frequency is 216 out of 1400 seeds, so that we may take as 
an estimate of po the ratio 216/1400 = 0-15. The transformed variate then becomes 


i = 


np -f- -|— 


50 (•0225) I 
0-70 j 


= ■\/{'>^P + 2), approximately. 



EANDOMISATION 


209 


On this basis the transformed variate-valnes are — 


TABLE 23.17 

Transformed Variates of Table 23.16 


Number of 
Experiment. 



Nximber of Treatment. 



Totals. 

1 

2 

3 

4 

5 

6 

7 

1 

3-464 

3-606 

3-162 

3-317 

3-000 

2-828 

3-317 

22-694 

2 

3-162 

3-464 

2-236 

3-000 

3-317 

2-236 

3-606 

21-021 

3 

2-646 

3-606 

2-000 

3-162 

3-464 

3-000 

3-606 

21-484 

4 

1-732 

2-828 

2-449 

3-873 

3-000 

3-464 

3-464 

20-810 

Totals 

11-004 

13-504 

9-847 

13-352 

12-781 

11-528 

13-993 

86-009 


The analysis of variance is — 


Sxxms of Squares. 

d.f. 

1 

BeWeen treatments .... 

3-486 

6 

Residual 

4-316 

21 

Totals 

7-802 

27 






Quotient. 


0-581 

0-206 


The sum of squares is particularly easy to obtain, being the sum of the original variates 
plus twice the number of variate -values. 

The variance ratio, 2-8, is barely significant, being just beyond the 5-per-cent, point. 
There is little evidence that treatments are exerting any effect on germination, since a 
comparison of treatments 6 and 7 (which are the same) indicates that such “ significance ” 
as exists may be due to heterogeneity in the seed. 


Randomisation 

• 23.36. Consider a two-way classification of pq members, the observed value of the 
jth A-member of the Jcth B-class being Xj,,. Following the line, already considered in 21 .48, 
we will consider the 2 -distribution in the population of values obtained by permuting the 
members in any A-class in all possible ways. There will thus be {q possible values of 
2 , all based on the observed values. We have already considered a case of this kind in 
dealing with the problem of m rankings (16.29) and we shall follow the same procedure 
in solving the more general problem. 


A.S. — VOL. n. 


p 
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Let the values he arrayed as 


OJii 

X12 

• 

^la 

^21 

^22 

. • ’ 

X2q 

^pi 

Xp< 2 , 


X^g, 


. (23.58) 


_ - . „ Q Vkp+;wft6n colunins mkI ^ "fclic totiil, wo 

If Sjs, is the sum of squares e ween row , ^ chapter, Sc is distributed aa vx^ 

know that in the ordinary case considered earlier 4 f Tt follows that 

with ^ - 1 d.f., and S - 8^ -Sae.s vx^ with - 1) (^ - 1) 

say, . - * • • {2fi.59) 

S-Sr 

is distributed in the Type I form 

dF oc _ If )i(33-i) (3-i)-i . , . . (23.60) 

It is easier to work with W than with r, but there is of course no difficulty in jmssing from 

We proceed to find the first four moments of W in the population of (</ !)" v alucs obtained 
by permuting the rows of (23.58) in all possible ways. 

23.37. If in (23.58) we increase the members of any row by a constant a, it is easily 
seen that Sq and 8 — Sr remain unaffected, and hence so does W. I huw w(^ may t.ake 
the mean of each row to be zero and then 8r — 0. With this origin w(‘ have 


>c i 


If now 




(23.61) 


^ik ~ ^ ^kj) 


(23.62) 


and the Jfc-statistics of the q values Xipj = 1 ... g, are written A',-,, A', - 2 , etc., and 

u = 


(23.63) 


we find 


F = i + 

p piq~l)Fki2 


i^ik) — ^ 

i^ik) — io. I) A:i2 ^k2 

E (JPa) = to - 1) to - 2) 


A^^3 ^k3 




(23.64) 

(23.65) 

(23.66) 

(23.67) 


l-L (9^ - 1) - 2) (q ~ 3) , , 

— k., k,, + k, a > 4 . 


. (23.68) 
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Then, for the moments of U, 
E{U) . 

E {U^) == {q - 1) 2^' kj,. 


i, k 


3 


(23.69) 

(23.70) 

(23.71) 


® ((73) = 6 (g - 1) k„ + <£- - T’ 

i>i kf I ^ 

’firT *1- + " - ~ •’ X' 

i. * J I / ^ 

+ 3 (g - 1)3 { (2" k,, k^Y - S’ k% ki,} 

-f ii _L1? ) 2J' ki^ kjcs ki2 + 72 (g — 1) Z' /c^g kj.^ ki^ k^2 • (23.72) 

where E' denotes summation over values for which the subscripts are unequal and permu- 
tations are not allowed. 

Finally, for the moments of W we have 

= ‘ 

P 


E{w - wy 
E(w - Wy 
E{W - 17)4 

+ 


E' 


h:2 kk>) 


p^{q~l) iEkt 2 )^ 

48 E' ki2 k).2 ki2 

■P^{q-iy^ {Ek2y 

48 (E' k,2 kc2y 


+ 


B{q ~2) E' h, 
p^q (q - ly (E ki 2 y 


(23.73) 

(23.74) 

(23.75) 


96 


r' ^2 


p4 {q - 1)2 ^Zk, 2 y {q - ly {q -f 1) (^^.g)* 

1152 E' k,2 k,2 k,2 k,,2 

p^q - ly {Ek,2y 


16 {q -- 2) {q - 3) Z' k,, k,, 192 (q - 2) Z' k,, k,:, k,^ 

p‘^{q-Vl){q){q~-iy {E k.i 2 Y p^ {q - ly q {Ek.^y' 


(23.76) 


These formulae can be derived in the manner of 16.33, but reference may be made to 
Pitman (19.38) for further details. 

23.38. We now consider how far the first four moments of Pf, as found above, agree 
with the first four moments of the distribution (23.60). The mean and variance of the 
latter are 


i and 

p pHpq-p -Y 2) 


. (23.77) 


The means agree exactly. For the variances to agree we must have, from (23.74) and 
(23.77), 

4 E'ki2kk2^ 2(p-l)_ 
p‘^ (q - 1) {E k2y p^ {pq^y+2y • ■ • 

rr 2 27 ' ki 2 kjc 2 

~ (Ek,2y ’ .... 


Writing 


. (23.79) 
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we find that (23.78) is equivalent to 

K = 


( p - 1) (g - 1) 

pq-p 2 


. (23.80) 


The ratio K may have any value from 0 to — , the lower limit being approached when 

one of the second Jfc-statistics is much larger than the others, the upper limit when they 
are all equal. Hence all that can be said about the variance of W is that it is not greater 

than — ^ and that it takes this value when the variance of each p-class is the same. 

p^ (g - 1) 

Turning to the third and fourth moments, we note that in many cases where the varia- 
tion is not too skew the quantities hi^ and will be neghgible. A number of terms in 
(23.75) and (23.76) may thus be neglected, but even those that remain are fairly com- 
plicated, and it is difficult to say how far the distribution of W will approach the Type I 
distribution (23.60). In practice the values may be worked out and compared. If there 
is reasonable agreement, the s-distribution of the variance ratio will hold in the particular 
population which we are considering. 

23.39. A better approach is to find the Type I distribution which has the same first 
two moments as W and to modify the 2 -test where necessary. It may be shown that when 
K is not too small the third and fourth moments of W and the fitted Type I distribution 
are in fairly good agreement, so that we may expect a good fit. 

1 2iK 

The Type I distribution with mean - and variance -r-^ , v has the mean and variance 

p P W — ^) 

of W by definition. Its third moment is easily seen to be 


8K^ 

p^ {q - 1) 


. (23.81) 


p — 1 


We have to see how far this differs from the actual third moment of W given by (23.75). 
Now 

3 S' ki.2 kj.2 ki2 = S k^2 S' kj^2 ^12 S' ^^2 ^^2 

= Z- k,2 S' k„2 h2 - {S k,2 S /cf2 - A k% ) 

= S k,2 (3A' k,2 kj,2 - A /4) + 

and hence 

6 S' ki2 kj,2 ki2 ^ 3 ^ _ 2 2 k . (23.82) 

(A 4)* {Sk,2)^- ■ , ■ ^ ^ 


and hence . 

. {S k^2 

Hence, from (23.82) and (23.83) 

„ S' ki2 k]c2 ki2 ^ 

{Sk,^^ -- 


6 U kj^2 0 TT 

-2 + 2 

concerned are positive, 

A ki2 A /4 > 

(s/42r^ 

Skl2 ^ f Sk^2 ■ 

] 2 

. (S k2)^ { (S k2)\ 

} 


6 _ 2 + 2 (1 - K)^ =K^( I 


I -K 
~ K 


(23.83) 


(23.84) 



RANDOMISED BLOCKS 


21B 


Similarly, since 

y f y VS 

< X^\ = <^ - < (1 - (1 - iA- - 

it appears that 


^ Jj ]Ci2 ^fc2 ^12 ^ 1^2 ^ ^ /oo oer\ 

® (Ek^- ~~r- 

On comparing (23.75) and (23.81), and assuming that the second term in the former may 
be neglected, we see that they differ by the factor whose limits we have found in (23.84) 
and (23.85), namely 


1 


1 - K 
K 


and 


3 + K 

4 ' 


If K is not too small the limits are not very different from unity, and the third moments 
are accordingly in fairly good agreement. 

In the same way but with rather more complicated algebra it may be shown that the 
fourth moments are in fair agreement. 

When all the rows are rankings, the case reduces to that considered in 16.33 et seq., 
and we have already seen that the distribution of W is closely approximated by the Type I 
distribution in that case. 


23.40. Suppose, now, that we have p classes of objects, one of each class belonging 
to a second series of classes, q in number. As our hypothesis we will suppose that member- 
ship of the (/-classes is independent of the variate- values, so that we may suppose it to be 
a matter of chance how the values in any jp-class are distributed among the ^-classes. On 
this hypothesis the variance ratio will follow the s-form approximately (subject to the 
conditions we have discussed above) in the population consisting of the (q l)'*^ permutations 
of observed mines ; and this will be so whether the parent is normal or not. 

By shaping the inference in this way, and making it conditional, we are thus able to 
apply the s-test oven in cases of non-normality. The test of homogeneity still applies, but 
of course the inference is rather different from the usual type. This point has not, perhaps, 
been adequately emphasised in the past and there still seems to be confusion on the subject. 

Randoynised Blocks 

23.41. The j)rinciple of testing in a conditional population has received its chief 
applications in a certain type of agripultural experiment (and analogous cases in other 
fields), known as a randomised block experiment. We are given p blocks of land and wish 
to test the existence of differential effects among q treatments, e.g. manurial treatments, 
of a crop to be grown on it. We divide each block into q plots and grow the crops on each 
of the j)q plots. In any one block we apply a different treatment to each of the q plots ; 
and we allocate tlie treatments among the plots at random. 

This randomisation is an essential part of the process. If the treatments exert no 
effect the observed yields might have occurred in any order, and by making the inference 
in the proper way we are able to test in the ^-distribution without assuming parent nor- 
mality or the non-existence of fertility differences between plots of the same block. If, 
of course, the parent is near to normality the test is strengthened. Had we not allocated 
the treatments at random the use of the 2 -distribution would not have been valid in the 
absence of normality (at least approximate) on the part of the parent. 
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23.42. It is of some importance to make clear the exact hypothesis which is being 
tested in this approach, since misunderstandings on the point have led to some rather 
heated controversy. If the treatments are numbered 1 to g, we consider the possible yield 
on the plot_^', h if it received the Zth treatment, say fjy In actual fact only one of these 
treatments was carried out ; the other values of Xj^ are hypothetical and are based on 
our conception of what would happen if the treatments were differently distributed. The 
totality of values form our hypothetical population. We are supposing that the 

observed yields can be expressed as 

^jk{l) — ^ ^jk(.l)’ 

where is an effect differing from block to block but constant within blocks, and is the 
“ individual ” plot effect which has a zero mean. The hypothesis we have considered in 
arriving at the validity of the z-test in conditional inferences is that every treatment affects 
every plot to the same extent, apart from the block effect a^. In short, we suppose that 
(j) is the same for aU 1. This is the hypothesis usually tested in data from randomised 
blocks. 

Neyman (1935a) proposed an alternative hypothesis, viz. that the mean effects of 
treatments over all blocks were the same, on the ground that we are interested in average 
treatment effects when testing fertilisers, not the effect on particular plots. The hypothesis 
here is that which is not the same as before ; and it appears from Neyman’s 

analysis that the z-distribution under randomisation may not hold to such a satisfactory 
approximation as in the former case. Once again we have to stress the importance of 
gaining a clear idea of the hypothesis under test. 


Example 23.8 (Eden and Yates, 1933 ; Pitman, 1938) 

Eden and Yates considered some data, based on actual experience of heights of wheat 
shoots, comprising eight classes of four, equivalent to the following measurements : — 

Class 


1 

2 

3 

4 

5 

6 

7 

8 

433 

455 

00 

407| 

452^ 

2571 

4341 

47 5. V 

429 

419| 

389 

574| 

436 i 

263A 

5261 

473.1 

383 

479 ' 

463 1 

477| 

415 

392“ 

470 

423.1 

437 

504| 

469| 

452I 

418 

426 

532 

48 ll 


The variances of the eight classes, in units of jV^^h, are then found to be 

7628; 15,702; 22,669 ; 59,732 ; 3,666; 90,593; 26,297; 8672. 

The quantity K of equation (23.79) is then found to be 0-7577. The quantity 
_ 1 ) (g _ 1 ) 

jP 4- 2 0*8077. Thus (23.80) is approximately satisfied and we expect that the 

z-distribution will be approximately reproduced by the data under random permutations. 

This was confirmed by Eden and Yates in a sampling experiment on the data. 1000 
sets of permutations were taken and z calculated for each. Agreement with expectation 
was good. 


Example 23.9 (Friedman, 1937) 

A good example of data from populations which are probably far from normal is given 
in Table 23.18, showing the standard deviations of expenditures on various items for six 
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income-groups. The figures relate to families of wage-earners 
m MimuNipolisand St. Paul, U.S.A., in 1935-6. 


and lower salaried workers 


TABLE 23.18 

, standard DemaHom of Expenditure on Certain Items of Families in Specified Income Groups. 

(Figures in brackets are ranks.) 


(’aU'gory of Expenditure. I 

1 


j ll(tUHing 

i HouK(‘hoId op(‘ratiou 
i Footl .... 
(’lothing 

j Kunii.HhingH, et.<-. 

I 'rransportation 
; Heerent ion 
Peraotud eart' 
Me<li<‘al care . 
E(iueati<>n 
(’onununit.y welfare 
Voeafion . 

(liftH .... 

‘ Other .... 


Annual Family Income (dollars). 


750 - 


100-3 ( 5 ) 
42-2 ( 1 ) 
71-3 ( 1 ) 
37-6 ( 1 ) 
58-3 ( 2 ) 
46-3 ( 1 ) 

19 - 0 ( 1 ) 

8-3 ( 1 ) 

20 - 1 ( 1 ) 

3 - 2 ( 1 ) 

4 - 1 ( 1 ) 
7-7 ( 1 ) 

5 - 3 ( 1 ) 
< 5-0 ( 5 ) 


1000 - 


68-4 ( 1 ) 
44-3 ( 3 ) 

81 - 9 ( 2 ) 
60-0 ( 3 ) 
52-7 ( 1 ) 

82 - 2 ( 2 ) 
23-1 ( 2 ) 

8-4 ( 2 ) 
33-5 ( 2 ) 

4 - 1 ( 2 ) 
18-9 ( 5 ) 
11-2 ( 5 ) 
10-9 ( 2 ) 

5 - 6 ( 4 ) 


1250 - 


89-5 ( 3 ) 
60-9 ( 4 ) 
100-7 ( 7 ) 
57-0 ( 2 ) 
96-0 ( 6 ) 
129-8 ( 3 ) 
38-7 ( 3 ) 

9-2 ( 3 ) 
60-1 ( 4 ) 
12-7 ( 4 ) 
8-5 ( 2 ) 

10 - 4 ( 2 ) 

11 - 2 ( 3 ) 
22-2 ( 7 ) 


1500 - 


77-9 ( 2 ) 
73-9 ( 6 ) 
86-5 ( 3 ) 
60-8 ( 4 ) 
60-4 ( 3 ) 
181-0 ( 6 ) 
45-8 ( 4 ) 

14-3 ( 6 ) 
69-3 ( 5 ) 
18-9 ( 5 ) 

12-9 ( 3 ) 
10-9 ( 4 ) 
25-3 ( 4 ) 
2-5 ( 2 ) 


1750 - 


100-0 ( 4 ) 
43-9 ( 2 ) 
100-3 ( 5 ) 
71-8 ( 5 ) 
104-3 ( 7 ) 
172-3 ( 5 ) 
59-0 ( 7 ) 
10-6 ( 4 ) 
114-3 ( 7 ) 
8-9 ( 3 ) 
25-3 ( 7 ) 
10-5 ( 3 ) 
42-3 ( 5 ) 

6-2 (6) 


2000 - 


108-2 ( 6 ) 
61-7 ( 5 ) 
90-7 ( 4 ) 

83-0 ( 6 ) 
89-8 ( 5 ) 
164-8 ( 4 ) 
50-7 ( 5 ) 

15-8 ( 7 ) 
45-3 ( 3 ) 
41-5 ( 6 ) 
19-9 ( 6 ) 
14-0 ( 6 ) 
48-8 ( 6 ) 
1 - 0 ( 1 ) 


2250-2500 


184-9 ( 7 ) 
102-3 ( 7 ) 
100-6 ( 6 ) 
117-1 ( 7 ) 
85-8 ( 4 ) 
246-8 ( 7 ) 
55-2 ( 6 ) 
12-5 ( 5 ) 
101-6 ( 6 ) 
66-3 ( 7 ) 

16-8 ( 4 ) 
14-4 ( 7 ) 
69 - 4 ( 7 ) 
4-0 ( 3 ) 


In hra(d<.etH we show the ranks of the figure for different income-groups for each 
<raU>gorv ol' ('xpeuditure. VVe wish to know whether the standard deviations for each 
<*.at (‘gorv d filer significantly for the different income levels. On the hypothesis that they 
<lo not. it is a matt(*r of chance how the ranks fall. 

'Pile sums of ninks in each column are ; — 


23, 36, 53, 57, 70, 70, 83. 


12^ 


Th(‘ co(fifi(‘i(nfi, of concordance (vol. I, p. 411) is then W = ^2 ^)> where w = 14, 

n 7 and S is tlu^ sum of squares of deviations of sums of ranks from the mean 
.50 ; w(^ lind that S = 2620 and W = 0-4774. We may test the significance 
(vol. I, p. -119) by writing 

im- l) W ^ J 24 


2 — log 
Vi = {n — 1) 


1 - W 
2 
m 


= 5f 


r., - (m - 1) r, = 76f 

The valiuj of z is highly significant, and we conclude that standard deviation is related to 
size of income-- the more money there is to spend, the more variable is the expenditure 
on particular items. 
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NOTES AND REFERENCES 

The idea of comparing variance between classes with the ^ 

order to test homogeneity is found as early as Lexis (see footno e ^ ^ j ‘ j 

developments, and particularly the exact test of significance for normal ])a.i< nt. , . i 

S to R. k. Fisher. Apart from papers by Irwin (1931 ^ 

of the theory of variance analysis are hard to find, many points of t leoic .u a in < h » g 

scattered among papers which are primarily practical. v# 

For the general theory and applications reference may be made to h isher s 
Methods (1925a, 1944) and Design of Experiments (1935c, 1942), to a iisolul introduc Ory 
account by Coulden (1939), and to the writings of Yates, particularly Ins Design and .1 mtlyms 

of Factorial Experiments (19376). _ i v. * . 

On the question of randomisation in preserving the ^-distribution Eden ami \ ates 

(1933), Welch (1937, 1938a), and Pitman (1938). References to work on ranking aiv guam 

at the end of Chapter 16. iv , , > 

For work on the distribution of the greatest of a set of variances sc(* Wisher ( l.iLiia, 

1940a), Cochran (1941), Stevens (1939a), Hartley (1938), and Finney (li>41a). K«>i‘ lurthcr 

work on the square-root and sin“i transformations see Cochran (19406), Ikadi {1912) and 
Curtiss (1943). 

The literature of this subject is now very large. Some furl her I'ldinaMiccs a tv given 
at the end of the next chapter. 

EXERCISES 

23.1. If Xj {j = I . . . n) are a set of normal independent variab's with variances 
l/w .consider the transformation 


Uk 


it- 

^ hj V'^p 


where the Z’s are defined by 
hk = Vi'^k/^^) 


j=i 


hk — 


hk 

hk — 


Wk 

?-l 




w 


(1 4 (I") 


k 

:) 

k 


1 . . 

2, 3 

1 , 2 , 


Show that the Z’s are orthogonal and hence that 

n n 

Z't-Z 


j - 2, 3, 
k ~= j 

j - 2, 3, 
k • - j i I 


n 




N 


N 


Wj, Xj^ 

n 

is distributed as with n degrees of freedom. Noting that n, ^ ' a-,, x,. \ I'u- is dis 

4 - I 


tributed normally with unit variance independently of w. . . . slinw that 

n 

k=l 

is distributed as with n ~ 1 degrees of freedom. 
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Hence derive the 2 -test for tlie analysis of variance with unequal members in a one-way 
classification. 


(Irwin, 1942.) 


23.2. Verify the arithmetic in the analysis of variance of Example 23.5. 

23.3. Verify the arithmetic in the analysis of variance of Example 23.6. 

23.4. In a bivariate table with k rows (different rows corresponding to different 

values of the .r- variate) write 

O’" X 

g = X K 4 ), 

O X 

where o'- is tlie variance of the y variate, the variance, and n^. the frequency in the row 
with variate -value .r. Thus 

Vnx _ ^ 

1 - 

and the ratio on the right is the variance-ratio in a one-way classification with unequal 
numbers. 

Show that, for any form of population, 


E (h) =-- k - 1 E {q) =N - k 

va,i- h = 2 («■ - 1) + (/?, - 3) ji:i- + 

L X -V J 

var q — 2 {N — k) + {(in — 3) Jx-i- N — 2^1- 

\.xn^ J 

cov {h, q) = Hi., — 3) jx-: - 1 + ^ — 27 —1. 

I ^ X ^xj 

Hence, appro.Kimately, that 

E _ A’ (A) [ YQxq _ COY {h,q) \ 

' Wy \ (q) E {h) E iq) ] 

u fi L _i q \ 

" \q) ~ W(q) 1 Eijh] ^ M(h)E {q) + J ' 

In the case wlnm all rows contain the same frequenc 3 '' 


and then 



N’ 





} 


var 


2 {k - 1) {N - 1) 
q) ~ (N - ky 


Hence show that the mean and variance of the variance-ratio are, to this order, independent 
of the distribution of y, indicating that the 2 -test is not very sensitive to deviations from 
normality. 

(E. S. Pearson, 1931J). It is ratlier remarkable that the correlation of h and q, far from 
disturbing the s-distribution, contributes to its stability.) 
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THE ANALYSIS OF VARIANCE~(2) 

Estimation of Glass-differences 

24.1. In the previous chapter we considered the analysis of variance mainly as the 
provider of tests of homogeneity. We have now to examine in more detail the problem of 
estimating class-effects, assuming that the homogeneity tests have shown them to exist. 
We discuss in the first instance the case in which there is only one member in each sub- 
class, and for the sake of simplicity confine ourselves to a two-way classification, though 
the theory is quite general. 

The fundamental hypothesis to be examined is that the data may be expressed in 
the form 

Xjj. = a.j- -f- -f Cjki ..... (24.1) 

where aj and 6^. represent class-effects and C is a random normal variate with zero mean. 
Our analysis of variance will have shown whether this is an acceptable hypothesis, and 
our present problem is to estimate the unknown values of a’s and 6’s from the observed a;’s. 

24.2. The joint probability of the fs is 

dF oc exp I ~ ~ ~ I • • • dCp,i, . • (24.2) 

where v is the variance of C > and in conformity with the notation used in the previous chapter 
we have p JL-classes and q R-classes. The maximum likelihood estimates of the a’s and 
6’s are then those which minimise the sum in curly brackets in (24.2), that is to say, the 
least-squares solution of the equations (24.1). In the usual way we find 




X* ~~ ^0 - 

A' = l 

11 

^ i^jk - % - h) = 

11 




which reduce to 


Xj_ — = 0 

x.k 


0 


(24.4) 


«. - <^k 

Summing the first equation over j, dividing by p, and subtracting from the first, we obtain 


— x^ = ttj — j — 1, . . . p . . . . (24.5) 

and similarly 

^.k — = h — k = 1, ... q. . . . (24.6) 

In (24.5) there are p equations, but if we sum them all we reach the identity 0 = 0, so that 
only p — I are independent. There is thus an element of indeterminacy which we may 
remove by supposing that a_ = 0. Similarly we may take = 0, and then we have 

% i = 1. • • • P - ■ ■ ■ (24.7), 

-X.. k = I, . . . q (24.8) 
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the uTnTmlim. is equal to the deviation of the mean in that class from 

sec^ Mmv th^^t' general n-way classification. We shall 

ease when the miniher^ *11 T- numbers in subclasses, except in a special 

c«i.s( wntn tiie nunU)ers are proportionate. ^ 

alit\' *huV oX'f ^ restriction on gener- 

h^ ^ligi^tly more general 

h>pothivsKs that. , has a mean m, m which case we have to minimise 


i^jk 






(24.9) 


This will he touiHl to lead back to equations (24.7) and (24.8), with the additional equation 
lor (‘stuiinting w ' 'i ^ 


m = X 


. (24.10) 


Oi again, il ut' piH^fi'r to absorb ni into the a-effects we have 


S- = ^j. 


fh. 


X 


.k 


X 


(24.11) 


th(‘ mean of Hj in this cause not vanishing. Which form we use is a matter of 


convemence. 


24.4. It is imjKH'tant to notice that the equations of estimation which we have just 
r(‘a(‘hed giv<‘ (>aeh (tj and h,^. independently of values in other classes. We obtain the same 
c'ciiiat ion for n,. whet fun- w(‘ happcm to be estimating other a’s and 6’s or not. This property, 
as we shall s(‘e shortly, fails to hold if the numbers in subclasses are disproportionate. 

1 h(' situation is similar to that in which we can determine the constants in a regression 
liiK' independiMit ly of th<‘ othcu's if orthogonal polynomials are used, in that each constant 
is given hy a si'parate (‘(jiiation not containing any of the others. Data of this kind are 
callc'd . 

'Phe direct <*oniparison of class-means which is possible with orthogonal data can be 
seen, from geiu'ral considcn’ations, to be legitimate. In comparing x^^ — x^^ with Xj^ — x , 
f he estiniaf<>s of llu' cdfecif s in the yith and jth J^-classes, we are in each case averaging over 
c/ /fclasses with on(‘ nunnhc'r in each. The R-classes, therefore, affect each mean to the 
sanu^ extent and do not. aflect their difference. If there are more members in some sub- 
classes than in oth(‘rs, thc^ means are unequally weighted with different R-effects and 
the* comparison is inva.lidatcal. 

24. .5. Regarding Xj^ - - ;r as the estimate of and X j^ — as the estimate of bj^, 
w(‘ SIM* that the* faniiliar eejuation 

A’ (./•■;, .r. A’ (.ly. - .r..)- + A” (x - a:..)2 4. yj 4. ( 24 . 12 ) 

cam ht' regar<l(‘d as an analysis of the sum of squares on the left, which has^sg — 1 degrees 
of frcH'doni, into tc'rms in which there is one degree of freedom for every fitted constant and 
a residual wit h (/i 1) {q — 1) degrees of freedom. Every constant fitted reduces the 

numl)(*r of (h'gretxs of fi’ctaiom in the residual by unity. 
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Unequal Numbers in Subclasses 

24.6. For a one-way classification we have already considered (23.7 and 23.8) the 
case where the numbers in subclasses are unequal. It was seen that the total sum of squares 
could be expressed as a sum between classes and a residual which were independently 
distributed and whose ratio therefore provided a homogeneity test in the usual way. 

When we try to extend this result to two-way or generally to w-way classifications, 
we begin to run into difficulties. We can still find, as shown below, an estimator of v based 
on 25 — 1 degrees of freedom and differences between A-classes, and one with q — I d.f. 
based on differences between jB-classes ; but these are no longer independent, and conse- 
quently we cannot subtract their sum from the total sum of squares in order to obtain 
a residual or an interaction term which also provides an unbiassed estimator. 

On the other hand, there is now available an independent estimator of v which did 
not appear in the orthogonal case where only one member was included in each subclass. 
In fact, since there are several members in any given subclass, we can find an estimator 
of V based on those members alone ; and we may pool all such to form an estimator with 
N — pq degrees of freedom, where there are pq subclasses. This estimator will be inde- 
pendent of subclass means and any estimators based on them, and hence provides 
a “residual” such as we require to carry out homogeneity tests. 

24.7. Suppose we have a two-way classification into p A-classes and q R-classes, and 

let the number of members in the subclass be n^j^. Let be the mean of these 

members. We may array the means as 


. 13 ) 


Now we may, in the first instance, test for homogeneity by ignoring the differences 
between A- and jB-classification and merely regarding the data as a one-way classification 
with pq classes. The usual test for homogeneity is then applicable. The sum of squares 
between means of classes will have pq — \ degrees of freedom, the total A — I d.f., and 
the residual A — 1 — {pq — 1) = N — pq d.f. This residual, in fact, is the one men- 
tioned in the previous section, and is based on the pooled sums of squares witliin the pq 
classes. The other term based on “ 1 degrees of freedom is the sum 

- {%• - 

and is derivable from the array (24.13). 

24.8. To test the effect of A -classification separately we proceed as follows ; — - 
Any is the mean of values and, on the usual hypothesis as to normality, will 

V 

have variance — . If x is the mean of all A values we have 

'^jk 

^ ^ok 

j, k 


^11 

^12 

• 

• 

Xia 


^22 

. 

• 








. (24 


X 


. (24.14) 
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Let the marginal imiceighted means in (24.13) be x^j,, so that 


Xa 


q k ^ 


^mk ^ik 

Pi ' 


( 24 . 15 ) 


when 


0.1 IliP hypoMicsis of homogeneity the variance of x,_ is given by 

■ ■ ■ 


V 


<?= 


( 24 . 16 ) 


( 24 . 17 ) 


Now UH legard the means x^^ as the means in ^ classes whose numbers are A,-, as 
is legitimate from (24.1(5). Then writing 


UN, 


( 24 . 18 ) 


we iiav<‘ for an unbiassed estimator of v 

I 


^ = . .( 24 . 19 ) 

J' ^ j P ^ i i j 

Tliis ('st imator has p I degrees of freedom and is distributed as x^- (This follows from 
the om'-wa.y omo (‘xei^pt that Nj may not be integral ; and its general truth may be estab- 
lished as in Ex(T<!is<^ 2.3.1.) It is independent of the residual with N — pq d.f., and hence 
the yl-elleets may b(‘ tested separately. 

Similarly, if 

( 24 . 20 ) 

an unbiassed (‘stimator of r is given by 

\ -d’-SJuX . 

(J — I [ k k } 


(24.21) 


wh(‘r(‘ 


U Mj. 

d = *- 




(24.22) 


and this also may b(‘ eomi)ared with the independent estimator based on N — pq d.f. 


Example 24.1 (data from Brandt (1933) considered by Yates (1934a)) 

d’a,bl(‘ 24.1 shows, for a number of breeds of pig, the numbers of each breed, 
<livid(Ml into nial(!! and female, and the total logarithm of the percentage bacon yielded by 
th(‘ slaughtcu'ed carcases. The logarithm has been taken so as to normalise the variate. 
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TABLE 24.1 


Numbers and Logarithm of Percentage Bacon in Bmds cj J ' 


Breed. 

Female. 

Number. 

Log. Percent. 
Bacon. 

Hampshire . 

33 

66-55 

Duroo Jersey 

51 

98-69 

Tamworth 

13 

25-90 

Yorkshire 

4 

7-62 

Berkshire 

8 

14-64 

Poland China 

15 

28-11 

Chester White . 

35 

66-90 

Others .... 

12 

23-32 

Totals . 

171 

331-73 


Number. 


89 

141 

17 

9 

4 

32 

47 

23 


302 


Male. 


Log. lV'nM‘nl. 
Bacon. 


2HI-43 

34-2(» 

I7-r.K 

8 - 21 ) 

04-42 

<M*-r)2 

40-70 


724-OSt 


! 


! 


) 


Tiie total sum of squares, which is not obtainable from 
as 13-0142. 

The class-means and reciprocals of class-frequencies 


this tabl(‘ as it stands, wo 
are given in 'fable- 24.2. 


((uoki 


TABLE 24.2 

Class-Means and Reciprocals of Class-Frequencies for the Data of 'I'ahlt 2t.I. 


Breed. 

Female. 

Male. 

rnwriiriitinl j 
Mrim of 
MoaiiS. ; 

Mean. 

1/nik 

- 

Mean. 

l/njk 

Hampshire . . . , 

2-016,667 

0-030,30 

2-034,158 

0-01 1.24 

2-025, 41 2 

Duroc Jersey . 

1-935,099 

0-019,61 

1-995,958 


1 -905.528 ' 

Tamworth .... 

1-992,307 

0-076,92 

2-011,765 

0*()r>H,H2 

2-002,030 j 

Yorkshire .... 

1-905,000 

0-250,00 

1-953,333 

0-1 11.11 

1-929.107 j 

Berkshire .... 

1-830,000 

0-125,00 

2-060.000 

0-250.00 

l-94(t.OOO 

Poland China . 

1-874,000 

0-066,67 

2-013,125 

0-031.25 

1-943,502 1 

Chester White . 

1-911,429 

0-028,57 

1-925,958 

0-021,28 

1-9 18.094- 1 

Others 

1-943,333 

0-083,33 

2-030,434 

0-0-i:},48 

1 -980. 884 ! 

Unweighted Mean of 
Means 

1-925,979 

(Total) 

0-680,40 

2-001,841 

(Total) 

0-534,27 

i 

l-903.td0 1 

1 

1 
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Tsils-ing fiist tliG dftssific&tion into msilo siiid foinstlG = 8), wg find, from the rGls/tions 

-L^lrJL 

Nj- k n^k 




64 


0-680,40 

64 _ 

0-534,27 ~ 


94-0623 

119-7896. 


Then, from (24.18) 


c = ^ ^ (94-062 3 X 1-925,979) + (119-7896 X 2-001,841) 

94-0623 + 119-7896 

= 1-968,474. 

Thus our Gstimatc of v, with one dogroG of frcGdom 

= Z{N.jxl)-c^ (UN^) 

= 0-3032. 


Similarly for tho eight breed-classes we find an estimate of v with seven degrees of 

~ , , , 0-6056 

freedom to be — - — = 0-0865. 

7 

Considering the 16 subclasses as a one-way classification, we find the following 
preliminary analysis (the arithmetical details of which we omit) : — 


TABLE 24.3 


Analysis of Variance, of Data in Table 24.1. 


Sum of Squares. 

d.f. 

Quotient. 

Between classes .... 

1-2715 

15 

0-0848 

Residual 

11-7427 

517 

0-0227 

Totals 

13-0142 

532 



The variance ratio here gives a value of z equal to 0-659, which is significant. Thus the 
data are not homogeneous. 

We now require to decide whether the departure from homogeneity is due to either 
breed or sex or to a combination of the two. For sex-differences we have found an estimate 
of V equal to 0-3032 with one d.f. Comparing this with the independent residual from 
Table 24.3 of 0-0227 with 517 d.f., we find that the effect of sex is significant. Similarly, 
for breed, the estimate of v is 0-0865 for 7 d.f., which again is significant. We conclude 
that both breed and sex influence the departure from homogeneity. 
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It is particularly important to note that since the estimates between breeds and between 
sex are dependent, we cannot analyse the yariance as follows : 


TABLE 24.4 

Incorrect Form of Analysis of Variance of Data of Table 24.1 


Sum of Squares. 

d.f. 

Quotient. 

Between sexes 

Between breeds 

“ Interaction ” 

Residual 

0-3032 

0-6066 

0-3627 

11-7427 

1 

7 

7 

517 

532 

0-3032 

0-0865 

0-0618 

0-0227 

Totals 

13-0142 



In fact the term shown as “ interaction ”, calculated so as to make the sums of squares 
and degrees of freedom additive in the usual way, is not an unbiassed estimate of v. This 
is a critical point of difference between the orthogonal and the non-orthogonal case. 


24.9. Suppose that the homogeneity test has shown the existence of significant 
class-effects. As before, we turn to consider the hypothesis that the data can be expressed 
as the sum of A- and B-effects separately with a random normal residual. Let be 
the typical member of the (j, A:)th subclass, I varying from 1 to Om hypothesis is then 

^jld ^ % H- + Wm’ (24.23) 

where C is normal with variance v. For convenience we will regard the mean of C as absorbed 
in the coefficients a, so that we may take C to have zero mean. 

The usual process of estimation of the a’s and 6’s leads to the minimisation of the 

sum over all N values of 


D - a.j - 

Differentiating with respect to a^ and 6*, we find the series of equations 

p- 6,) = 0, i = 

z r (xja - - h) = 0, k^-i...q 

0 

where Z' denotes summation over the n^,, values in a subclass. These equations reduce to 


Writing Nj^ 


A n.j, a. + Z n.j, 

11 

k 

k 

k 

Z «„■ -f Z nj^ 

b, == Znj, 

i 

j 

J 

and N for Z n^^ 

n., we have 


ttj + E njf^ b]^ 

k 

B <Xj- 

j 

To which we may add 


— Z %<; 

7c 

~ Z njjf. Xjfg 

i 

Z hjg = Q. 


j = 1, . . . p 

k = 1, . . . q- 


(24.2i>) 


. (24.2b) 
. (24.27) 


. (24.28) 
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Had we chosen to absorb the mean of C into the 6 ’s, this last equation would be replaced 
by Z — 0 . 
i 

When all the n-’s are equal these equations reduce to the orthogonal case, and each 
a- or 6 -coetficient can be independently estimated. In the contrary case the equations 
have to be solved as they stand. 


Exam, pie 24.2 

Returning to the data of Table 24.1, we find for equations (24.26) and (24.27) the 
following, the values of the constants required being obtainable from the body or marginal 
sums of the table itself : — 


171ai 


+ 

336i 

+ 

5162 

+ 

1362 

+ 

464 


865 

-f 156o 

-f 3567 

+ 1268 

= 331-73 


362a2 

+ 

89&I 

+ 

1416a 

+ 

1763 

+ 

964 

+ 

465 

+ 326o 

+ 4767 

2363 

= 724-09 

33a 1 

-f 8 9a 2 

+ 

1226i 












= 247-59 

51ai 

f Ultta 




19262 










= 380-12 

13ai 

-r UUs 





+ 

3063 








= 60-10 

4a 1 

f 9a -2 







4. 

136.1 






= 25-20 

8 a 1 

-r 4a -2 









+ 

1263 




= 22-84 

1 5a 1 

4 - 32a.2 











+ 476, 

5 


= 92-53 

35a 1 

- 1 - 47tt2 












+ 8267 


= 157-42 

12 ai 

-!■■ 2 3a 2 













+ 3563 

= 70-02 

To 

wdiich we 

may add 

(ti + 

a 2 

= 0 . 










The solutions are 

— = 0-026,507 ; 

h, -= 2-017,259 ; b., = 1-967,367 ; 6., = 1-999,799 ; b., = 1-928,267 ; 

br, -- 1-912,169 ; 6o -= 1-959,136 ; b, = 1-915,877 ; = 1-992,241. 

Th(>se give us the '' best ” estimates of the mean effects of sex and breed on the 

liypothesis e.xprcssed by (24.23). 

The moan of the b’s is 1-961,514 which may be taken as an estimate of the mean of C, 
the 6-effects then being the differences of the above 6- values from this mean. 

24.10. Let us now consider the analysis of variance in the non-orthogonal case, 
when constants have been fitted by least squares in the above-mentioned way. 

To mak{> the discussion clearer we will regard the estimation as relating to p constants 
Oj, r('la.t<'d b\’ E {(tj) 0, </ constants 6/_., related by E {b,J 0, and the mean m. There 

are thus p 1 7 1 ijidependent constants which, in effect, provide estimates of the means 

of subclass(^s. Whatever these means really are, tlie residual quotient based on N — pq 
degrees of fr(‘edom gives an unbiassed estimator of v, the common variance. We have 
now to a.na:lyse the remaining sum of squares based on pq — 1 d.f. 

If the true (population) values of the constants are denoted by aj, and //, the sum 

is distrilnited as vx'^ with N degrees of freedom. Developing yet another variation on 
a familiar theme, we show that the corresponding quantity 

^ — S' — h— /O^ — ^ (»j — 

- E {b^ - - ( 24 : 29 ) 

is distributed as vx'^ with N — (P + ? — 1) d.f. 

A.S. — VOL. n. 


Q 
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In fact, equations (24.26) and (24.27) stow that the estimators a, 6 (and in our present 
case m also) are linear in the variables x. We can then find + O' — 1 orthogonal normal 
variables in terms of which they can he expressed. Their sum of squares will be distributed 
as with ^ ~ 1 degrees of freedom (not some multiple of because the mean value 
must be ^ + gf — 1 in virtue of 18.17). Thus the remaining term S {Xjj^i — bj^ m)® 
is distributed as with A — {p ^ — 1 ) degrees of freedom, independently of the portion 
due to the constants a, b and m. 

Eurthermore, the actual reduction in sums of squares, equivalent to the sum of the 
last three terms in (24.29), may be easily determined. Precisely as in the similar problem 
of evaluating residuals in a regression equation, we have 


^ i^jki — % — h- 


0 hi 


- T T Xjj,i -wtE Xjja 

k ji I 


where, of course, summation takes place over all values. 


(24.30) 


24 . 11 . The total sum of squares is already calculated about the estimated mean 
m, so that the reduction for the term E =N has already been taken into account. 
The total sum is then distributed as vx^ with A — 1 d.f., as we already know. We know 
further that we can split off the independent residual sum based on A — pq degrees of 
freedom. This leaves us with a sum based on pq - 1 d.f. From the previous section it 
follows that we can analyse this sum into two parts : [a] the sum of squares due to fitting 
the constants aj and 6 *, accounting foTp-hq-2 d.f., and {b) the remainder based on 
_ 1 _ (p q- g _ 2) = (p — 1) (g — 1) d.f. This remainder is independent of the sum 
of squares due to fitting constants and provides an unbiassed estimator of v. If the ratio, 
as compared with the residual based on A — pq d.f., is significant, the hypothesis of additive 
effects breaks down. In short, we may regard this quantity as an interaction tcu-m. 

24 . 12 . One important point to notice in this connection is that the interaction term 
depends on whether p + g- — 2 or fewer constants are fitted. In the orthogonal case we 
can determine an interaction term once and for all, however things stand in regard to the 
estimation of inter-class effects ; but for non-orthogonal data the number of class-effects 
estimated affects the interaction term, and if necessary a new significance test has to be 
applied if further estimates are calculated. The situation is similar to tlie testing of 
regression coefficients when orthogonal polynomials are not employed. 


Example 24.3 

Returning again to the data discussed in Examples 24.1 and 24.2, let us regard the 
means in all 16 subclasses as simultaneously under estimate. Eor the reduction in sum 
of squares due to the constants we find, using the values of a and b found in Example 24.2, — 


0-026,507 (— 331-73 -f- 724-09) + (2-017,259 X 247-59) -h (1-967,367 X 380-12) . . . 


(1055-82)2 

533 


1-04146. 


Here, for instance, the sum E af is given by multiplying by the term E already 

k 


found. The last term removes the effect of including the mean among the 6’s. 
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The sum of squares between classes was found in Example 24,1 to be 1-2715, based 
on 15 d.f. We then have 


Sum of Squares. 

d.f. 

Quotient. 

Sex and breed (estimation of constants) 

1-0415 

8 

0-1302 

Interaction 

0-2300 

7 

0-0329 

Between classes 

1-2715 

15 



Comparing the interaction term 0-0329 (7 d.f.) with the residual 0-0229 (517 d.f.) we see 
that it is not significant. 

If we neglect sex and consider breed alone, we have only to estimate eight constants 
bx ■ . ■ bs subject to {b) =0. The sum of squares for breed alone is given by 

i (247-69)» + i (380-12)2 . . . - ^ (1085-82)2 = 0-7263. 

Similarly the sum of squares for sex alone will be found to be 0-4224. We have the 
following analysis : — 


TABLE 24.5 


Further Analysis of Variance of Data of Table 24.1. 


Sum of Squares. 


d.f. 

Quotient. ' 

Test fo7' Sex 




Betwec'u breed (estimation of constants) 

0-7253 

7 

— 

Sox 

0-3162 

1 

0-3162 

Sox and breed 

1-0415 

8 

— 

Test for Breed 




Botwoon sox (estimation of constant.s) . 

0-4224 

1 

— 

BtcmhI 

0-6191 

7 

0*0884 

Sox and brwd 

1-0415 

8 

— 

Interaction 

0-2300 

7 

0-0329 

B<3two(^n clasBcs 

1-2715 

15 



Here, for instance, if we test for sex there are seven independent constants for breed 
and one for sex, tlie latter being the only one that interests us ; and similarly for breed. 
On comparison with the residual 0-0227 both sex and breed are found to be significant. 

24.13. The reader may perhaps find the various tests of Examples 24.1 and 24.3 
confusing, and we accordingly summarise our results for the case of unequal numbers in 
subclasses. 

In every case, except where each subclass contains not more than one member, an 
estimate of the common variance v may be obtained, with N — pq d.f., by poofing the 
sums of squares within the pq .subclasses. Call this Vi. 
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Homogeneity may then be tested {a) by considering the classes as a single one-way 
classification and comparing the quotient between means with Vx, or (6) by calculating 
for either classification separately the estimates based on (24. 19) and comparing them with v^. 

If homogeneity is rejected in favour of the additive effect of classes expressed by the 
usual hypothesis, the sum of squares between all classes based on pg' — 1 d.f. may be split 
into independent sums related to the fitting df the constants and to an interaction term. 
The latter can be compared with to test for interaction. If this is not significant, alter- 
native tests for effects between A- and between R-classes may be derived by testing the 
sum of squares attributable to the fitting of the respective constants against iq. These 
tests are, in effect, tests of one class neglecting the effect of the other, and may not be 
accurate if the latter effect is not negligible. It is probably better to fit constants to both 
classes simultaneously in the first instance. 


Proportionate Frequencies 

24.14. We have previously spoken of non-orthogonal data as meaning any classi- 
fication with unequal frequencies in the subclasses, but there is one other case of unequal 
frequencies for which orthogonality exists, namely the one in which frequencies are pro- 
portionate, i.e. there are marginal frequencies Ij, such that 

njj^ = ljmy.. ..... (24.31) 

Here the means of A-classes are estimates of the individual corresponding a's (though it 
must not be overlooked that they are based on different numbers of members in margins), 
and the sum of squares between A-means may be computed in the usual manner appro- 
priate to a one-way classification with unequal numbers. Similarly for B. The interactions 
may be estimated by subtracting the A- and J5-sums from the sum of squares between 
classes. We leave it to the reader to verify these statements. 


Special case of 2 x 2 . . . Classification 

24.15. The foregoing analysis can be extended to the "^-way classification, but in 
the general case the solution” of the equations becomes rather complex and the arithmetic 
a considerable nuisance. Where, however, the classifications are simple dichotomies the 
problem simplifies to a great extent. For instance, in equations (24.27), if there are only 
two values of aj, which we may take to be a and — a, we have 

^.kh=^ '>^jk ^jk - 'fhk ci + n^k « • 
i 

We have selected the u’s so that Z {a) = 0, which implies that the mean m is ama lgamated 
with the fo’s. Substituting for the 6’s in (24.26), we find 


a 




y ^ '^Ik '^2k 


fc k .k 


which reduces to 


f ^11 ^12 I ^21 ^22 
\ ^11+^12 ^21+^22 


+ 


^11 ^12 / - - \ 1 ^21 ^22 / - “ \ I 

^ (^^11 (^21 ^22) 




^21 ~1~'^22 


(24.32) 


Thus a is the weighted mean of the differences of corresponding R-class means and may 
be determined direct. So generally for a 2 x 2 x 2 . . . classification. The differences 
may be tested for homogeneity by the 2-test, which in this case reduces to the i-test. 


24.16. In view of the relative complexity of the non-orthogonal case, it is natural 
to wonder whether any serious error would be committed if we regarded the p X q table 
of array means as an ordinary two-way table with one member in each class and analysed 
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the variance accordingly. Evidently such a procedure sacrifices a lot of information about 
variation in subclasses, but that is not the point. Is the analysis valid ? 

The hypothesis on which the analysis is based is equality of variance in subclasses. 
If the numbers in subclasses are very unequal the means based on them will have very 
unequal variances, and we expect that the analysis may be misleading. If, however, the 
numbers are close to equality the analysis will probably be approximately correct. 

Example 24.4 

Reverting once again to the data considered in earlier examples, we have the following 
analysis for the variance of the 2x8 table of class-means : — 


Sum of Squares. 

d.f. 

Quotient. 

Between sex 

0-3032 

1 

0-3032 

Between breed 

0-2635 

7 

0-0376 

Re.sidual 

0-2387 

7 

0-0341 

Totals 

0-8054 

15 



The sum of squares between sex is the same as before, as it must be for a dichotomy, 
but the effect of breed is seriously underestimated and would not be judged significant by 
comparison with the interaction term, which is our residual. The numbers in the breed- 
classes are, in fixct, too different to justify the approximation. 

The Mis,sing Plot TecJmiq'ue '■ 

24.17. The simplicity of the analysis of variance in the orthogonal case and the 
economy iin{)orted by keeping the number of values as low as possible often leads to the 
carrying out of experiments with only one member in each subclass. But this has a certain 
practical danger in that the value in a subclass may be lost through circumstances beyond 
the experimenter’s control. For instance, an animal may die in the course of an experiment, 
or a crop on a particular plot may be ruined by pest ; or sometimes a record may actually 
be lost after measurements have been carried out. In such cases we may estimate the 
missing values and perform a variance-analysis in the following way. 

24.18. Clonsider in the first place a p x q classification with certain missing values, 
r in number. We assume as usual that the variate-values are expressible in the form 

Xjj^ = a,j fi- bf. + Qjjf. -f- m, ..... (24.31) 
ajid w(‘ know that the best ” estimators of the constants are 

m — X I 

a.^x^,-x^\ (24.34) 

h - ^..J 

The quantities on the right are, however, unknown to us because of the missing values. 
Suppose that we estimate the constants by minimising 

i^jk — dj — b]^ — ■ ■ ■ • • (24.35) 

where the summation E' takes place over known values. Our estimators are then deter- 
minate and may be written a'j, b'j^ and m'. 
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We will now estimate the missing value on the plot {j, h) by the equation 

X'jk = S' ■+• . . . • • (24.36) 

We have 

S (Xj-fc — — bf, — my = S' ~ bj, — my + S {X^j, — — m)S (24.37) 

r 

Let us now consider this as a function to be minimised, involving the unknowns a, b, m 
and r further unknowns The equations giving the latter will be obtained by differ- 

entiating (24.37) with respect to each X^j^, and in fact are typified by 

^jk — S' “f" 

that is to say, by (24.36). The other constants are given by such equations as 

i:' - a;. -b;,^m')+S (X/^ - -b,-m')=^ 0. . . (24.38) 

r 

The second term vanishes, and hence we obtain the same minimal values for 6;. and 
m' as by minimising (24.35) by itself. Furthermore, the equations of estimation (24.38) 
may be written 

Y {x^j, - aj - b'j, - m') = 0, . . . . (24.39) 

where the summation takes place over all values, those of the observed a:’s where known 
and over the estimated X’s where values are missing. 

It follows that if we write Xj^^ for the r missing values, ascertain tlie residual sum of 
squares, which will be a function of observations and these r unknowns, and minimise 
it for variation in these unknowns, we shall obtain equations providing estimates of the 
unknowns equivalent to (24.36). The following example illustrates the method. 

Example 24.5 (Yates, 19336) 

The following table shows the measurements of inteiisity of infection of certain potato 
tubers under eight manurial treatments in ten blocks. 


TABLE 24.6 

Intensity of Infection of Potato Tubers. 
Blocks 


Treat- 

raents. 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 1 

'I'OTALH. 

1 

3-55 

2-29 

6 

2-00 

3-34 

3-83 

3-86 

3-50 

2-23 

1 

2-91 

27-51 

+ h 

2 

2-30 

4-03 

2-54 

2-82 

3-29 

2-93 

/ 

' 2-55 

2-20 

2-30 

24-96 

-\-f 

3 

3-96 

3-62 

3-46 

2-50 

2-94 

3-70 

3-82 

2-54 

3- 18 

3-69 

33-41 

4 

2-99 

3-99 ! 

2-90 

3-97 

4-49 

4-70 

3-86 

h 

3-50 

3-59 

33-99 

-1- h 

5 

a 

3-07 

3-49 

1-07 

3-99 

3-48 

3-80 

3-68 

3-24 

2-70 

28-52 

-|- a 

6 

2-36 

3-47 

2-64 

3-17 

3-26 

3-28 

9 

i 

3-07 

3-12 

24-37 

-f g H- i 

7 

216 

2-34 

1-96 

2-60 

3-77 

d 

3-20 

3-47 

2-67 

3-33 

25-50 

-h d 

8 

316 

2-52 

2-39 

3-68 

c 

e 

3-85 

3-36 

2-50 

4-13 

25-59 

-1- c -1- e 

Totals 

20-48 

25-33 

19-38 

21-81 

25-08 

21-92 

22-39 

19-10 

22-59 

25-77 

223-85 

-1- a 


-f- a 


+ h 


+ c 

+d+e 

+f+9 

4“ 4" i 



-)-6 -(~i 

c-\-d-\-e 
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There are nine missing values in this table, indicated by the letters a . . . i. Omitting 
purely numerical terms, which are irrelevant for the purposes of minimisation, we have 
for the total sum of squares, 

+ 62 c2 + . . . + ^2 _ (223-85 + a + 6 + c + . . .+Q 2 . 

for the sum of squares between blocks, 

s' { (20-48 -f- -j- (19-38 -j- 6)2 (19-10 -4- A -i- 

— (223-85 a b -{- c . . , -j- i 

and for that between treatments, 

tV { (27-51 + 6)2 + (24-96 + f)^ + . . . + (25-59 + c + e)^ } 

— -g--^ (223-85 -j- CL b c 

The residual sum of squares is the difference of the first and the sum of the second and 
third of these expressions. For minimisation we differentiate with respect to a, 6, ... i 
in turn. On some arithmetic simplification we find 


63a 

+ 

6 

"1“ 

c 


d 

+ 

e 

4- 

/ 


9 

4- 

h 

4“ 

i - 

= 209-11 

a 

+ 636 

+ 

c 

+ 

d 

+ 

e 

4- 

/ 

+ 

9 

4_ 

h 

4” 

i - 

= 190-03 

a 

4“ 

6 

+ 1 

33c 

4“ 

d 

— 

7e 

4- 

/ 

■4“ 

9 

+ 

h 

4- 

i ” 

= 231-67 

a 

4- 

6 

+ 

c 

4- ' 

63d 

— 

9e 

4- 

/ 

4” 

9 

_|_. 

h 

4- 

i = 

= 199-35 

a 

4- 

6 

— 

7c 

— 

9d 

1 

63c 

4_ 

/ 

4- 

9 

4" 

h 

4^ 

i - 

= 200-07 

a 

4- 

6 

+ 

c 

4- 

d 

4_ 

e 

4- ' 

63/ 

— 

^9 

4” 

h 

4- 

i = 

= 199-73 

a 

-H 

6 

+ 

c 

+ 

d 

-h 

e 

— 

9/ 

+ 

Q39 

4- 

h 

— 

li = 

= 195-01 

a 

4- 

6 


c 


d 

4- 

e 

4~ 

/ 

4- 

9 

4~ < 

63A 

— 

II 

= 239-07 

a 

4- 

6 

+ 

c 

+ 

d 

4_ 

e 

-h 

/ 

— 

'^9 

— 

9A 

+ 

!i 

= 162-11 


This set of linear equations can, of course, be solved by routine methods, but also by iterative 
processes as follows : — 

The mean of existent values is 3-15. Assume this to be approximately the values of 
6, c . . . i. Then for a we have, from the first of the above equations — 

a = -gV {209-11 — (8 X 3-15) } = 2-92. 

Taking this value of a and 3-15 for c, d . . . i, we find for 6 from the second equation, 

6 = gig- (190-03 — (7 X 3-15) - 2-92} = 2-62. 

Similarly, from the third equation, 

c gV {231-67 + (2 X 3-15) - 2-92 — 2-62} = 3-69, 

and so on. On reaching i we recalculate a from the first equation, using the approximations 
to the values of the other constants already obtained ; and so on until our values do not 
alter. In this case only a second approximation is necessary, the values being — 



a 

h 

c 

d 

e 

/ 

j 

9 

h 

■ 

i 

First Approx. 

8ccond Apjirox. 

2-92 

2-88 

2-62 

2-58 

3-69 

3-73 

3-27 

3-33 

3-76 

3-76 

1 

3-26 

3-32 

i 

1 

3-60 

3-61 

3-88 

3-89 

3-22 

3-22 


These are our estimates of missing yields. The treatment means are found to be 
12345 678 

3-009 2-828 3-341 3-788 3-140 3-120 2-883 3-308 
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24 . 19 . The question now arises how we may analyse the variance of data for which 
missing values have been estimated in this way. 

The original data provided a classification with unequal numbers in subclasses and 
can be analysed by the methods given earlier in the chapter ; except that, since no sub- 
class contains more than one member, we cannot find a residual sum of squares within sub- 
classes based on N — pq d.f. {N — pq, in fact, is a negative number.) For instance, 
regarding the data as a one-way classification with pq — r classes, we shall have an analysis 
of this type ; — 

Sums of squares d.f. 

Between classes * . . p q — 2 

Residual . . . . (jj ~ 1) (g — 1) — r 

Total . . . . pq — r — I 

The effect of the two classifications separately can be dealt with in the manner of 
Example 24.1. 

24 . 20 . Two simplifications are possible. In the first place, since the minimisation 
of the residual is the same for the original data as for the data completed by estimates of 
missing values, we can use the latter to compute the residual precisely as for an orthogonal 
case, which simplifies the arithmetic. 

Secondly, it appears that to an adequate approximation we may substitute the esti- 
mated values for missing values and analyse the resulting material in the ordinary way 
as if it were orthogonal. If the proportion of missing values is high this a])])roximation 
may perhaps break down, and in practice we should probably regard the ex]'>eriment as 
ruined. More usually only a few records are missing, and the effect of replacing them by 
estimates is hardly likely to affect judgments of significance seriously. 

Example 24.6 

Continuing the analysis of the data of the previous example, we find, for the total sum 
of squares, 32-1012 with 70 d.f. The analysis of the completed data, that is to say the original 
data plus the estimates of missing values, is as follows ; — 



Sum of Square.^. 

d.f. 

Quotient. | 

j 

Between blocks 

9-7176 

9 

l-()797 

Between treatments .... 

6-5812 

7 

0-9402 

Residual 

17-6902 

54 

0-3276 

Totals 

33-9890 

70 



* It is assumed that no row or column in the two-way classification is entirely empty. If it were, 
we should have to ignore it and confine attention to the remaining arrays. 



HKLATIONSHIP WITH REGRESSION ANALYSIS 
Treating the original data as a ease of unequal class numbers we find 


Ri'suliml 


'’J’'OTAIjS 


Sum of Squares. 

d.f. 

Quotient. | 

and treatments 

14-4110 

16 

0-9007 

• * • . 

17-6902 

54 

0-3276 



32-1012 

70 


' - - 

______ 




For l)looks only 


23a 


F 


Sum of Squares. 

d.f. 

Quotient. 

Between blocks 

Iter nain dor 

8-5690 

5-8420 

9 

7 

0-9521 

0-8346 

]il<)(*ks atitl treatments 

14-4110 

16 



or troatinonts only : - 


Sum of Squares. 

d.f. 

Quotient. 

Bot.w('(Mi tn-atmeuts .... 

6-2648 

7 



0-8950 

Hdunaindi^r 

8-1462 

9 

0-9051 

HlocUs and treiitrnoHits 

14-4110 

16 


1 






\\’h(‘t.h{M’ w(‘ n,s(‘ t lu' auaiv.sLs of completed data or the more exact form, we see that 
difleri'iUK's Ix'twoen blocks and between treatments are significant as judged by the residual 
varia nce. t wo analyses are, in fact, not very different, and even with as many as nine 

missing valui's out of SO we should not err by substituting estimated values and treating 
tlu' (lata, as orthogonal. 


Eelafioiishi p in'l/i I{(‘(j/rM,sion. Analysis 

24.21. 'riu^ g('neral 7/-way classifications to which variance- analysis may be applied 
are not lU'cisssat-ily d(‘t/(‘nniu(al by a measurable variate. As for contingency tables, rows 
or columns can b(‘ interchanged without affecting the analysis. We can, however, regard 
a mnlt ivai-iatei fr'(‘(|ueney table as an %-way classification and apply variance-analysis to 
it; a.nd just/ as regr(\ssion and correlation analysis provide a refinement on contingency 
analysis becausi' of the arrangement of the classes in order by reference to a variate, so we 
may to sorn(‘ (‘xt/(mt n^fino the analysis of variance in such a case. 


24.22, (lonsidcu- in the first instance a, p x q table of frequencies in the form of a 
correlation table. We will suppose the J. -classification to be according to the variate x 
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and the R-classification according to y. Let us now consider the hypothesis that the data 
emanate from a normal bivariate population with zero correlation (or, somewhat more 
generally, that for any given y the a;’s are distributed normally with the same mean and 
variance). We can then regard the data as a one-way classification according to y with 
unequal frequencies and analyse the variance in the usual form : — 


Sxim of Squares. 

d.f. 

Quotient. 

Between classes . 

^ Uj {XJ ~ »)2 

q - 1 

N var X 
2-1 




Residual 


N - q 

N {1 — r]^) var 
N - q 

Totals . 

N var X 

N - 1 



Here is the mean of aj-values in the jth ^/-class, x is the mean of all N values, is the 
variate-value in the ith. a:-class and ^‘th y-class, and there are g ^/-classes. The quotients 
are expressible in terms of the correlation ratio of x on y, viz. (cf. 14.23. vol. I, p. 351). 

Now, on our hypothesis, the sums of squares between classes and the residual are 
independently distributed in the Type III form, and hence the variance ratio 


N — q 
g — 1 1—7]^ 


. (24.41) 


can be tested in Fisher’s distribution with Vi = q — 1, = N — q. This is the test we 

gave in 14.25 (vol. I, p. 353) and it is reached by an argument of essentially the same 
kind. 


24.23. Now suppose that our p x q table is normal but correlated ; or, somewhat 
more generally, that the values in arrays of constant y are normally distributed with the 
same variance but with means which vary linearly with y, say 

= m + hy^. (24.42) 

Then our data can be represented by the form 

= m -f by^ + (24.43) 

where the ^’s are distributed normally with zero mean and the same variance v. Apart 
from the constant m, the only unknown here is the constant b. Our least-squares estimates 
(measuring from the means of x and y) now lead to the familiar form for the regression 
coefhcient 

' = ^ 

where summation takes place over all values observed. This is, of course, equivalent to 


cov {x, y) 
var y 


(24.46) 
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Further, the reduction in sum of squares attributable to fitting the constant b is 

Nb cov {x, y) = ^ ’ — =z N r'^ var x, . . . (24.46) 

var y 

where r is the correlation coefficient of the sample. 

Our analysis of variance may then be written — 


TABLE 24.7 

A7ialysis of Variance of a Correlation Table 


Sum of Squares. 


d.f. 

Quotient. 

Regression constant b 

Nr^ var x 

1 

Nr^ var x 

Between classes (after regression is eliminated) 

JV (7j2 _ 

q-2 

^ 

N - var X 

q-2 

i 

! Residual 

! 

i 

N (1 — rj^) var x 

N - q 

AT 1 - ’J" 

JV var X 

N - q 

1Y)Tals 

N var X 

N -1 



This analysis gives us a test of the significance of the correlation coefficient in samples 
from an uncorrelated population and also of linearity of regression. 

In fact , if the parent correlation is zero, the parent value of b is zero and the quotient 
due to h is independent of the sum of the other items in the analysis. Thus the ratio 

Nr^vaxx _ An\ 

~ 1 - ■ • ■ ‘ ■ 1 1 


is (listril)uted in Fislier’s form with ~ 1, Vs = JT — 2. This is equivalent to saying that 



{N - 2) 
Y _ ^2 


. (24.48) 


is distril)uted in “ Student’s ” form with N — 2 d.f., which brings us back by a different 
route to the test given in 14.15 (vol. I, p. 342). 


24.24. Secondly, if we assume that the parent correlation is not zero but the regres- 
sion is lint'ar, the sum of squares between classes after regression is eliminated is independent 
of the residual in Table 24.7, and hence the ratio 


N var X 


rj* 


a 


_ fj‘ 


r^ N 


N var X 


rj‘ 


2 1 


r]‘ 


(24.49) 


N 


is distributed in Fisher’s form with Vi = q — 2, = N ■— q. This test (due to Fisher 

himself) gives a test of linearity of regression in the normal case. 

It should be noticed that this test is only approximate if the classification is one of 
a normal population with broad groupings. If correlation exists, the distribution of a 
bivariate normal sample in an array of finite width is not exactly normal, being the sum 
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of a number of normal distributions with slightly different means. Unless the grouping 
is very coarse, this is not likely to invalidate tests of significance in practice. 


24.25. Consider now the general regression formula for p variates, — 


Xi = 62 x% 63 x^ -j- • • • “f" ^p- .... (24.50) 


If we assume that the residuals x^ — ^ bj Xj (say x) are distributed normally with 


3=2 


constant variance, our least-squares estimates of the regression coeificients are those given 
by the usual theory, and the fitting of (p — 1) constants reduces the sum of squares by 
JV var a; jR^, where R is the multiple correlation coefficient (cf. 15.16, vol. I, p. 380). We 
then have the analysis — 


Sum of Squares. 

d.f. 

Quotient. 

Between classes (regression constants) 

N var X 

p - 1 

V 

, iv var X 

p - 1 

Residual 

NvS^TX(1 - 

N — p 

1 - R2 

- N var ;« 

N — p 

Totals 

N var X 

N - 1 



If the regression is in fact linear of type (24.50), the residual quotient is independent of 
that due to fitting regression constants, and the hypothesis may be tested by means of 
the ratio 


N -p 

p — I 1 — 


(24.51) 


which is distributed in Fisher’s form with — p — 1, = N — p. This brings us to 

the distribution of R^ given in 15.20. 


24.26. It is to be observed that in (24.50) we may choose the variates x., . . . 

as we please. In particular, we can take them to be polynomials of a single variate. From 
this point of view the analysis of variance links up with the theory of regression analysis, 
given in Chapter 22. If the polynomials are orthogonal we can fit the constants b one 
at a time, the fitting of any constant leaving unchanged the previous determination of those 
of lower orders. The reduction in sum of squares for each constant can be separately 
ascertained and corresponds to the loss of a further degree of freedom ; and at any stage 
we may test the residual variapce to see whether any particular term is worth while in the 
sense that it makes a significant contribution to the total variance. The exact test, of 
course, depends on the usual assumptions of normality. 

24.27. The reader is now in a position to see a number of statistical topics which 
on the surface appear to be distinct as parts of a single theory. Regression analysis, with 
its subsidiary of correlation analysis, proceeds by the successive fitting of constants by 
least-squares. For the normal case this is equivalent to estimation by maximum likelihood. 
Partial and multiple regression, together with curvilinear regression, can all be subsumed 
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under this central idea. The fitting of each constant splits off a separate contribution to 
the total variance which, under certain hypotheses, is independent of the others. Variance- 
analysis proceeds in much the same way, but is more general in the sense that it can deal 
with the classification of values, however determined. Our various exact tests of signifi- 
cance of homogeneity in variance, of linearity of regression, of significance of correlations 
in uncorrelated material, of the difference of two means where variances are equal, of the 
correlation ratios, of the multiple correlation coefficient — all derive ultimately from Fisher’s 
distribution of the variance-ratio in the normal case. 


The AnalyMs of Covariance 

24.28. Suppose that we have a one-way classification, possibly with unequal numbers, 
and that in each class the members present values not of a single variate, such as we have 
considered up to now, but pairs of variate- values typified by y^p j referring as usual 
to class and i to the number within the class. By the ordinary methods of variance-analysis 
we can discuss the effect of classification either on the a;-variate or on the ^/-variate ; but 
there also aiises for consideration the effect of class -membership on the covariation of 
X and y. This leads us to an extension of the analysis of variance to that of covariance. 


24.29. By an easy extension of the results for a single variate we have, analogously to 

■i, j i, i j 

the equation in product terms 

•»’..) “ ?/..) = ~ (y>j - y.j) + (y-j - 2 /..) ( 24 . 52 ) 

i,J irj i 

If we consukn- the w liolc sample as homogeneous the correlation between x and y is given by 

F {xij . ) {Vij 2/. . ) 


We hav(‘ also the correlation between means of classes 

Z ix i — X ) (y — y ) 

and may (ailcuilah* a. correlation of residuals within classes 

^ i^ij ^.j) iyij ~ y.j) 


. (24.53) 


(24.54) 


. (24.55) 


24.30. If there is heterogeneity present we should expect these correlations to differ ; 
and simila.T-l'V' for tlu^ three kinds of regression of y on x, such as 


h ^..) (2/0 V --) (24 56) 

Z{x,j~x^y ' ■ ' 

The three correlations of (24.53)-(24.55) are, however, not additive, like sums of squares ; 
nor are tlie regr-essions corresponding. The covariances expressed by (24.52) are additive, 
but there is no simple test, such as exists for variance-ratios, to determine the significance 
of differences or ratios of covariances. Covariance analysis, however, is not primarily 
designed to test independence, but to examine whether there is any variation according 
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to class between the regressions oi y on x within and between classes. Let us suppose 
that there is some linear relation of the form 


F jWy ^ (-X 

Following the notation of E. S. Pearson, we write 


iij 


C 

^22j 


E {Xij - x^jY 

i 

^ iVa 

i 

^ i^ij - ^.j) iVij - y.j) 






0 


9 

EG, 

3 

EC, 

3 

Ollm = Ki - ^..y 


22a 


^12a 


'22j 

1 

'12j 


G 


22 m 


-y.y 


^I 2 m = X nj (Xj - x^J (yj - i/..) 


. (24.57) 


. (24.58) 


. (24.59) 


. (24.60) 


and Clio, C 220 , G 120 for the corresponding total sums of squares and products. We may 
then exhibit the composition of the total sums of squares and products in the form of Table 
24.8. The arithmetic of the analysis follows that of ordinary variance-analysis. We 
shall give an example presently. 


TABLE 24.8 


Analysis of Variance and Covariance for One-Way Glassification — Sums of Sqaar(‘s and 

Products and Regression Coefficients. 


Variation. 

d.f. 

Sum of Squares, 
cc-variate. 

Sum of Squares. 
2 /- variate. 

Sum of Products. 
.ry. 

Rogn^ssion 

Coeificionts. 

Within yth group 

nj - 1 

Ouj 

C' 22 j 

' 

Cvij 

^ 

' Cnj 

Within groups . 

N -p 

Oua 

i 

<^220 i 

j 

1 

Ci2a 

i 

J Cl 2 a 

=■ cu^ 

Between groups 

p - 1 

Glim j 

1 

. 

C^ 22 m 1 

1 

Cl 2 m 

r Cl 2 }}(. 

^IVm 

Totals 

N - 1 

1 

<^220 

! 

C 120 : 

, Ci 2 () 

" Ouo 


We now suppose that, apart from the regression effects represented by (24.57), the 
variation of x is normal with constant variance v. We can then compile various estimates 
of V from the residual variation after the effect of fitting regression constants has been 
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removed. For instance, within classes we have for the estimator of v, with N — 2'p degrees 
of freedom, ° 

W~~ 2p ~ ) } ^ 

The number of degrees of freedom follows from the fact that we have fitted a mean and 
a regression coefficient to each of p classes, making a reduction of 2p in all. We then obtain 


TABLE 24.9 

Analysis of Covariance for One-Way Classification with Linear Regressions. 


Variation due to 

d.f. 

Sum of Squares. 

Deviations from linear regressions 
within classes 

N — 2p 

^ j (yiJ ~ V-j ~ ~ 

= {02-2j — hj Oi2j) = Si 

3 

Differences iiriioiig rep^rcssions . 

p - 1 

(bj — baY {Xij - X,jY 

i,3 

~ ^'(^3' ^12j) ~ ba 0'l2a = 

3 

■sr^ 

2_j iva - y-3 - ba {Xij - x,j)y 

A A 

Deviations within classes from 
linear regression . 

N — p — 1 



h i 

= 0^22a i>a Ci2a == "t“ ^2 

Deviations b(itw(u;'n classes from 
linear regression 6,^ .... 

p — 2 

^ {y .3 ~y.. ~bm (x.j - a;..)}2 

i 

= Gi2m “ Sz 

.Difforence.s between bn and bm 

1 

/ ^ { i^a ■*“ i^7n) 



(6^ ^o) {xij — .)}^ 

- {t>a bra) ~ 

Total deviation from linear regres- 
sion 6 q 

N - 2 

^ iyii -y.. - &o 

i* i 

= (^220 ”” ^0^120 "1-aS^2 +-^3 +^4 





240 


THE ANALYSIS OE VARIANCE 


The reader will probably find it useful to check the expressions in the third column of 
Table 24.9 and to examine how the sum of squares of deviations from the regression line 
of the whole is analysed into the constituent items. 


24.31. Suppose now that we wish to test whether the relationship between x and y 
can be represented by the formula (24.57), and that there is no material class-effect present. 
Then>Siof Table 24.9 should be an unbiassed estimator of {N — 2^) v and should be inde- 
pendent of the residual estimator Ss + S^, which has 2^3 — 2 d.f. We may therefore 
test the hypothesis by the ratio 


2 p -2 

N — 2p /Sg ~t“ ^3 “f" iS i 


Vi = N — 2 'p, V 2 — 2 p — 2. 


(24.61) 


If this variance ratio is insignificant we consider next whether the regressions differ in 
the p classes. For this purpose we compare the estimator derived from 8 ^ with that based 
on 8 x', i.c. the ratio 


82 N ~2p 
P - 1 ■ s. 


~ I, 


= N — 22') 


. (24.62) 


will be significant if differences are to be regarded as real. 

If this ratio is not significant, 81 and 82 may be pooled. Comparison of their sum 
with 82 will afford a test whether the relation between group means is linear. The ratio 
for this purpose is 


Si -‘r 82 p — 2 

N - p - I ■ 


Vi = N — p — I, 


V 2 = p — 2 . 


. (24.63) 


Finally, even if this ratio is not significant, it does not follow that the common regression 
within groups is the same as the regression of the means of groups. To test this point 
we consider the ratio 


81 -f- ^2 1 


Vi = N — p ~ 1, 


V2 = 


1 . 


. (24.64) 


Example 24.7 

A number of recruits are given a preliminary test to ascertain their suitability for a 
certain course of training. At the end of the training course they undergo a proficiency 
test. The marks for three groups of recruits from three different towns are — 


Group 1 


Preliminary : 
Proficiency ; 


Group 2 


fPreliminary ; 
[Proficiency : 


Group 3 


Preliminary : 
Proficiency ; 


45, 50, 56, 58, 59, 60, 62, 64, 65, 75 

46, 60, 52, 46, 48, 50, 55, 63, 58, 64 


44, 49, 52, 52, 58, 59, 60, 62, 
48, 55, 45, 60, 65, 64, 69, 71, 


63, 63, 66, (>9, 70; 72, 73 
77, 70, 75, 80, 72, 75, 81 


47, 52, 59, 60, 63, 66, 68, 69, 74, 76 
43, 56, 51, 72, 60, 61, 55, 74, 72, 80. 


We are interested here in the efficiency of the preliminary test as a predictor of the 
proficiency test. We therefore consider the regression of the marks obtained in the latter 
(y) on those obtained in the former {x). We are, however, also very much interested in 
the question whether the regressions are the same, apart from purely sampling effects, 
in the three groups. Such a matter would naturally arise, for instance, if we were thinking 
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of applying the same rejection standards in preliminary tests to all recruits, irrespective of 
their town of origin. 

Our scores are given to the nearest unit, and hence the variates are discontinuous. 
We will neglect this effect and assume that the scores are distributed approximately 
normally. 

About origin x — y — 50 the sums of squares and cross-products are : — 



n. 

Z (X). 

^(y)- 



S (xy). 

Group 1 .... 

10 

94 

42 

1496 

594 

694 

j Group 2 . . . . 

15 

162 

257 

2802 

6101 

3989 

; Group 3 . . . . 

10 

! 

134 

124 

1 

2556 

2776 

2422 

1 

We can then calculate the quantities G. For instance. 




Cm = 1496 - 94 — = 612-4 

10 


Cx2x - 694 - 42 — = 299-2 

10 

Cxict — Ciix -f- Oix2 + CxX3! 

We find tlie following table in the form of Table 24.8 : — 

TABLE 24.10 


Analy<'^is of Variance a.nd Ciovar lance for Data of Exam'ple 24.7 — Sums of Squares and Products 

and Regressions 


Viiriatiou. 

d.f. 

Sum of Squares. 

Sum of Squares. 
y'K 

Sum of Products. 
xy. 

Regressions. 

Within first }j:r<)up 

9 

6 'xxi - 612-4 

0..,x == 417-6 

Gi .,1 = 299-2 

6 i = 0-4886 

,, |j;roup 

14 

C'xx, = 1652-4 

-- 1697-73 

Gi.>, -= 1213-4 

6 a = 1-1530 

,, third yroup 

9 

C'xt, - 760-4 

GCaa =" 1238-4 

a ,33 = 760-4 

63 = 1-0000 

1 

1 Witliin grouiLS . 

32 

Cl la 2425-2 

--t3353-73 

Gvia ■== 2273-0 

ba = 0-9372 

1 groups 

' 2 

! 

Ciua- «3-09 

1005-01 

Gvim-= 118-57 

6 ,ft= 1-4270 

1 

To'I'.a.:ls 

1 

34 

Gxio 2508-29 

Gao,, - 4358-74 

6 * 12 ,, = 2391-57 

6 „ - 0-9535 


A cojuparisoii of the throe regressions within groups indicates some heterogeneity. 
Tt looks a,K if the preliminary test is not such a good predictor for the first group as for 
the others. We may proceed to test the reality of this effect by constructing Table 24.11 
on the lines of dhiblo 24.9. For instance, 

: - Gi^ijbj) = (417-6 — 299-2 x 0-4886) -h (two similar terms) 


A.a. — VOL. II. 


E 
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We find— TABLE 24.11 


Analysis of Covariance of Data of Example 24.7 — Linear Begressions. 


Variation. 

d.f. 

Sums S. 

Quotient. 

Deviations from regressions hj ... 

29 

Si = 1048*1 

36*1 

Differences hj 

2 

= 176*4 

87*7 

Deviations from ha 

31 

Si+S^^ 1223*5 

39*5 

Deviations of groups from hm ... 

1 

S^ = 835*6 

835*6 

Difference between ha and hm, ... 

1 

S^ = 19*3 

19*3 

Totals 

33 

Si + + S^ = 2078*4 



A comparison of the quotient 36-1 (29 d.f.) with the quotient of the remaining items, 
257*6 (4 d.f.) indicates that there are real differences between classes. A single regression 
equation will not represent all three class-relations. A comparison of the deviations from 
regressions, 36*1 (29 d.f.), with the differences of regressions among themselves, 87*7 
(2 d.f.), does not reject the hypothesis of equality of regressions within groups. We there- 
fore compare the deviations from b^, 39*5 (31 d.f.), with the deviations of groups from 6,,^, 
835*6 (1 d.f.). This is significant, suggesting that the hypothesis of linearity of regression 
of group-means should be rejected. 

The general result is to confirm our suspicion of heterogeneity. The correlation. 


coefficients between x and y are — 

Within first group ..... 0*592 

,, second group ..... 0*908 

,, third group ..... 0*784 

Within groups ...... 0*797 

Between groups ...... 0*410 

Total 0*722 


Again the deviations between groups stand out as indicating heterogeneity. 

24.32. The analysis of covariance may be extended to the case where there is more 
than one independent variate. The regression coefficients are found in the usual way, 
and the sums of squares after regressions have been removed can be found and compared 
on the usual hypotheses. Suppose, for instance, there are two independent variates and 
a classification giving an analysis between classes and residual. We may re])T*esent the 
analysis thus : — 




Sum of Squares. 

Sun 

1 of Products 

• 


d.f. 











a?! 

y2 


yxi 



Between classes 

n 

A 

B 

0 

P 

Q 


E 

Residual .... 

n' 

A' 

B' 

C' 

P' 

Q' 


R' 

Totals 

n" 

A" 

B" 

C" 

p" 

Q" 


R" 
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Our regressions are then- 


I Between classes 
I Residual 


Totals . 


6i 

^2 

BQ - PB 

AR -PQ 

AB - P2 

AB - P2 

B'Q' - P'B' 

A'B' - P'Q' 

'A'B' - P'2' 

A'B' - P'2 

B"Q" - P"B" 

A"B" - P"Q" 

'A"B'' ~ 

AJ'B" - P"2 


The sums of sqtiares C can then be reduced by eliminating regressions, i.e. by subtracting 
Qbi + Rb., giving 

^ _ BQ^ - PQR __ AR^ - PQR 

AR - P 2 - p a 

_ ABC - AR^ - BQ^ - GP^ + 2PQR 

~ ab^^ ■ • • 

This a.nd tlie analogous quantities with primes give independent estimators of the 
variance of the residual element, and a comparison to test homogeneity may be made in 
the usual way. 

24.33. In a, case such as that of Example 24.7 it is evident that a comparison of 
?/-means between groups is affected by what we know about the a:-vahies. If we know nothing 
about tbe hittei-, comparison of the y’s is a univariate problem and can be treated by the 
methods already <liscusaed, tlie difference of means, for example, being tested by the use 
of standard errors or tlie if-test. But suppose that our themselves are found to be dif- 
ferent between groups and that there is significant correlation between x and y. Then 
it is possible that the relation, if any, between 2/’s in different groups is not, so to speak, 
aninlierent (|uality of the variation of y, but is merely a reflection of their dependence on 
the .r’s, which ha{)|)en to exhibit significant differences. In Example 24.7, differences in 
proficiency between groups may be due simply to differences of ability which were present 
before tlu^ training began and, if so, should be shown by differences between groups in the 
preliminary scores. We should not then be able to conclude from proficiency scores alone 
that training in one group had a more marked effect than in another. The differences 
were tliere before the training was applied. 

24.34. If, then, we require to consider the effects of training alone on the groups, 
we ma,y “ correct ” the y-values by deducting the estimates 

= (24.66) 

or other more general regression equations. This, so to speak, allows for differences due 
to variations of the aj-variate. 
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Assuming that one linear regression equation adequately describes the 
between y and x, so that the corrected values are 

Va - = Vij - 2 /.. - ^0 {Xij - • 

we see that the difference of the corrected means of two classes and y./. is 


relationship 


. (24.67) 


y.j - y.k - ^0 - x^j,). 


(24.68) 


This may be regarded as the sum of two parts which are independent. The estimated 

O 2 

variance of the first part, y is — , where is the mean-square of the residual affcei 

correcting for regression and the means of y _j- and y are both based on q membeis. Simi- 

2 

larly the variance of h is where A is the sum of squares of the o^-variate enteiing into 

the residual row of the analysis. Regarding the ic’s as fixed from sample to sarnple, so 
that our inference is conditional, we see that the variance of the difference (24.68) is given by 


g2 1 j q. .... (24.69) 

The ratio of the difference to the square root of this expression is distributed as btudent s 
t, with degrees of freedom one fewer in number than those of the original residual. 

24.35. Similarly, if we have two independent variables Xi and Xo, the corrected 
difference of y-means is 

y.j — y.k — {^1 i^lj ~ ^Ik) +■ (^2j “ ^’27c) } • • • 

where temporarily we write Xij for the mean of the variate .i-x in the Jth class, and so on. 
The variance of the part in curly brackets may be derived by considering the variance of 
the general expression -j- yb.,. From the equations for by and b., we have 

, _ BZ [yxy) —PE (yXi) 'i 

AB -P^^~^ I ... (24.71) 

_ -PE{yxy)pAE{yx.^ > * ■ • ' 

AB^P'^ 


where, as in 24.32, A and B are the sums of squares for aq, 0 : 2 , and P is the cross-product. 
Thus the coefficient of any y in Xhy + yb^ is 

{XB — — xP) X2 

IR ->2 


Since the y’s are independent the estimated variance of ?Jjy -f yb., is 

{A{^-IAPr- + 2P{XB- fiP) {ixA - IP) + B {,,A -XP)-] 

{AB — P^r 

_X^B — 2XyP + y2 A 
“ AB -P^ 


. (24.72) 


Thus for the estimated variance of the corrected difference (24.70) we have 

^2 f 2 , PR - 2XyP + 

® AB -P^ 

where X = Xyj — oJi* and y = x^j — x^k- As usual, the difference divided by the square 
root of this quantity may be tested in the i-distribution. 


. (24.73) 
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X of the analysis of variance and covariance has not attempted 

to cover all the applications of the method in particular directions. We have concentrated 
so far as possible on the fundamental ideas and the broad lines of analysis to which they 
lead. Some further developments will be given in later chapters, but we must refer the 
r ^ complete acquaintance with the subject to the references given at 

the end of this chapter and the preceding. We will conclude our exposition with three 
hnal comments. 

(a) Part of our hypothesis throughout has been that the residual element C has constant 
variance from^ one subclass to another. In Chapter 26 we shall discuss methods of testing 
omogeneity in residual variance. For completeness we might perhaps have anticipated 
some of these tests in the present chapter, at least to the extent of exemplifying their use 
We have not done so mainly for reasons of economy in space ; but the omission of mention 
01 the point in foregoing examples should not lead the reader to overlook (as many writers 

do overlook) the necessity for testing variance-homogeneity where possible, if it is required 
as part of the hypothesis. 

(b) In the majority of our examples we have proceeded at once to analyses of variance 
or covariance without dwelling on points which would require attention in any practical 
inquiry, tor insttince, since the primary function of many variance-analyses is to test 
t it^ lomogeneity of a set of class-means, the first stage would be to compute those means 
and examine whether they suggest any lack of homogeneity on intuitive grounds. Again, 
if heteiogeneity is established, consideration of the means themselves, or of the primary 

data, will sometimes sliow how it arises. The student must never lose sight of his primary 
material. i 

(c) Elaborating this point to some extent, we would emphasise that the analysis of 
variance, like other statistical techniques, is not a mill which will grind out results auto- 
matically without care or forethought on the part of the operator. It is a rather delicate 
instrunnmt which can be called into play when precision is needed, but requires skill as 
well as (mthusiasm to apply to the best advantage. The reader who roves among the 
litcraturci of tlu'. suliject will sometimes find elaborate analyses applied to data in order to 
prove something which was almost obvious from careful inspection right from the start ; 
or h(‘ will find results stated without cpialifi cation as “significant” without any attempt 
at critical a))[)reciation. This is not the occasion to deliver a homily on the necessity for 
self-discii)line in the use of advanced theoretical techniques, but the analysis of variance 
would provide quite a good text for a discourse on that interesting subject. 


N0TE8 AND REFERENCES 

For tlK> analysis of variance where subclass frequencies are unequal, see Brandt (19:}3) 
and an important paper by Yates {l!)34ff). Wilks (1938e) has considered the subject from 
the theoretical viewpoint and exhibited the main results determinantally. For the missing 
plot tecliniipie see Allan and Wishart (1930) and Yates (19336). For the analysis of 
covariance^ see Fisher's Statistical Methods, Bartlett (1934a), an appendix by E. S. Pearson 
to a paper by Wilsdon (1934), Brady (1935), Wishart (1936), and Day and Fisher (1937). 
The last-mentioned paper works through a practical example in some detail and will 
repay study. 

tSee also references to the previous chapter. 



246 


THE ANALYSIS OF VARIANCE 


EXERCISES 

24.1. For a two-way classification with one member in each subclass show that, 
for normal variation, 

E - a;..) = 0, 

and hence that the sums E {Xj —x and E {x — x are independent. Examine 

3 ' " k ' 

how this breaks down for the non-orthogonal case. 

24.2. Verify the arithmetic of Example 24.6. 

24.3. Generalise formula (24.73) in the following way. If there are m independent 
variates, the variance of corrected differences is 

m 

19' r, s=l J 

A 

where = x^^ — x^^^, and where is the cofactor of in the determinant 

j |, and — E x^Xg summed over the sample. 

(Wishart, 1936.) 

24.4. Derive by the analysis of variance the test of a regression coefficient given 

in 22.19. 



CHAPTER 26 

THE DESIGN OF SAMPLING INQUIRIES 


Ivf uencc of Theory on Sampling Design 

25.1 . The reader who is accustomed to handling the results of a sampling investigation, 
as they appear in everyday statistical work may have wondered more than once in previous 
cliajders w’hetlier theory was not reaching out too far in advance of practice. It is true 
that for cei'tain types of experimental inquiry, notably in agricultural and biological research, 
the precision of exact statistical tests does not seem out of place ; but in economic or social 
statistics, for example, there is often so much error and imperfection in the raw data that 
the a.p}.)Ucation of refined methods of analysis would be a waste of time. It is clearly 
useless, and may even be dangerous, to exercise an elaborate mathematical technique on 
data wliich are suspect from the very start of the inquiry. If our theory is to be really 
aer\'iceahle to the statistician and not merely an enticing mental exercise it must be capable 
of solving practical problems. 


25.2. Now it has to be admitted that much of the material with which statisticians 
havt' to work at the present day cannot be treated by the methods expounded in the fore- 
going ])ages when sampling questions are concerned. The commonest reason, but by no 
m(‘ans the only one, is that the sampling process by which the data were obtained was 
biasst'd. In such cases the statistician has to lay aside the refined implements of his craft 
and do th(‘ best he can with his refractory material in the light of his own judgment and 
connnonsens(‘. A good deal of current statistical work is of this kind, and there is even 
a H(H‘tion of tliought which is inclined to depreciate the advanced theory of the subject as 

academic ” in tiie sense that it is too remote from practical affairs to be worth stud;^ng. 
The inisuridc'rstanding is not likely to be removed by the counter-accusation sometimes 
launelu'd by theoreticians that the theory is quite capable of being applied by anyone who 
has the al)iiity to comprehend it. 

25.3. Fortunately there is a growing realisation that the two points of view can 
be iceonciled bv collecting the data in such a form that the theory can be applied to 

it I f onK’ (Mioiigh care is taken at the initial stages of an inquiry there is no need for the 
anixNiranee of imperfect data which defy exact analysis. Knowing beforehand what 
t,' ti<., a i„st ,-un!e,>te are at our disposal, and armed with a cleax understanding of what 
(iu(‘st ions we a, re trying to answer, we can frequently frame the investigation so as to m 
mis(‘ tlH' informa, tioii acquired with the minimum of effort. In short, the scope and na ure 
of our tlu'orv it.self dictates, to some extent, the form which the sampling inquiry s ou 
a.ssume, in' former times the statistician was usuaUy asked to 

data wliieh were collected by inexpert agents, frequently for quite different purposes. 
Nowid'i vs he is still in the same position in some respects, hut sometimes he is called m to 
advise oil tlie design of the inquiry and can, within limits, detern^e the form m which the 
dltrar^-omH'tef He can make his theory applicable by selectmg his sample in the 

pnpK'f way. 

i The seneral theory of the design of sampling inquiries has not progressed far 
enough tr us t he "we to gJe a systematic account of it in this chapter. In some fields, 
ft 247 
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particularly that of agricultural experimentation, it has reached quite an advanced degree 
of perfection ; in others there remain many problems unsolved and possibly many more 
which have not yet even been formulated. At the risk of some discontinuity of treatment, 
therefore, we shall only give in this chapter a number of instances in which theoretical con- 
siderations exert a considerable effect on the scope of a sampling inquiry, in order to illus- 
trate the field to be covered. There are, of course, many factors which ultimately deter- 
mine the form of an investigation, such as cost and expenditure of time, but they will 
not concern us here. For the present we shall be concerned solely with the extent to which 
theoretical considerations contribute to all the factors that have to be taken into account 
when an inquiry is designed. 

Some Preliminary Points 

25.5. There are certain preliminary points which, though obvious enough when stated 
explicitly, are often overlooked and cause a good deal of bad design. 

{a) The fundamental object of sampling is to obtain information about a population, 
and it is of the first importance to begin with a clear idea of what that population 
is. Imagine, for instance, that we are asked to ascertain whether pasteurised milk has 
a different feeding value from raw milk. In what population is this inquiry to be made : 
among children ? among the inhabitants of the British Isles ? among those who habitually 
drink milk or those who do not ? among townspeople or among country folk ? and so 
on. Again, suppose that we are given a new variety of barley and wish to know whether 
it has a heavier yield than a previously known type. Do we mean heavier in the usual 
barley-growing areas ? in every kind of climate or on the average over a series of different 
climatic conditions ? when subject to the same manurial treatments as those in current 
use ? and so on. 

(6) In a similar way, it is necessary to have an equally clear idea of what we are trying 
to find out about the population. In our example of raw and pasteurised milk, are we 
content to know that there is (or is not) a differential effect for children as a whole ? or do 
we wish to ascertain whether any such effect varies at different ages, between sexes, or 
according to nutritional standards ? What exactly should we like to know ? It is no use 
returning the facile reply “ all about it ” to this query, for our information must be limited 
in virtue of the finite size of our sample. We must make up our minds what information 
we require and which questions have priority if it becomes necessary to sacrifice some of 
them for practical reasons. 

(c) Thirdly, we should consider what we know already about our population. This 
point becomes of particular importance when our prior knowledge indicates heterogeneity, 
for then we may, in effect, have to divide the population into sub-groups and sample separ- 
ately from each. In our milk example, it is to be expected that children of different ages 
may react differently, or that children from lower-class schools may respond differently 
from those in middle-class schools. Or again, in our barley example, the two varieties 
may compare quite differently on Hertfordshire loam and on Lincolnshire chalk. It would 
be misleading to lump all the comparisons together when we have strong reason to suspect 
heterogeneity beforehand. In effect, prior knowledge of this kind frequently dictates the 
types of question we ask under (6), and the two are often different facets of the same problem. 

{d) As an extension of the same point, we may notice that prior knowledge about the 
population sometimes indicates what sort of averages to use and what sort of tests of 
significance it is proper to apply. Crop-yields, for instance, are known to be distributed 
in a form approaching the normal, so that arithmetic means are good estimates of parent 
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means and the tests based on normal theory may be applied. Accident statistics, on the 
other hand, are often distributed in a modified Poisson form ; income statistics in a J-shaped 
form, and so forth. 

(e) A specification of the population and a decision as to the precise object of the 
inquiry will usually determine certain parameters which it is required to estimate or certain 
hypotheses for test. In general the problem is one of estimation, but not necessarily so. 
In our case of pasteurised and raw milk, for instance, we should probably wish to know 
the exact amount of the difference between the effects of the two (a matter of estimation), 
not merely whether a difference existed (a matter of significance). We then wish to know, 
before the inquiry begins, whether the estimates we shall have are going to be accurate 
enough for our purpose ; or alternatively, if the sample is of a given size, how accurate they 
will be. It may not always be possible to answer such a question completely beforehand, 
since the sampling variances will in general depend on quantities which have to be estimated 
when the data are available, but it is always useful to consider in a general way what sort 
of magnitudes would be shown as significant and what values would leave us still in reason- 
able doubt. As a rule, matters such as this are closely related to sample size. 

(/) Finally, our estimates will be subject to experimental error and, in development 
of the last point, we have to try to find the form of experimental design which, while answer- 
ing our questions, does so with the minimum error. From a slightly different standpoint, 
if we can determine the amount of error which is admissible, the problem is to find the 
design which achieves no more than that error with the minimum expenditure of effort. 
Furthermore, we require to be able to estimate the extent of probable errors. In short, we 
require an efficient design, just as the engineer requires an efficient engine or the aircraft 
designer an efficient form of airscrew, and for exactly the same reasons. 

25.6. To sura uj), our primary task in embarking on a sampling inquiry is to ascertain 
as accurately as possible what is the population under examination, and what is the informa- 
tion about it whicli we require. If, as usually is the case, that information concerns statis- 
tical chara(d.c>ristics such as means and variances, or more generally frequency-distributions, 
our second task is to design an inquiry which will j^rovide estimates of these unknown 
quantiti('s and will, at tlic same time, provide estimates of their sampling error. It is not 
always ])ossiblo, as wo shall see later, to obtain full satisfaction in the reduction of error 
and the estimation of error simultaneously. Increased accuracy of estimation may mean 
loss of precision in our estimate of sampling error, so that although we are nearer the truth 
we do not know how near. There does not appear to be any single rule which will cover 
all the (iases tha,t cian arise. We shall refer to a particular case of some interest in 25.39. 


Stratified Sanipliwj 

25.7. We lionsider at the outset a case of fairly frequent occurrence in. the sampling 
of existent populations. Suppose we are interested in the mean value of a variate x in 
some poj)ulation If ; and that we know, or suspect, that the population is heterogeneous 
in the sense that we can delimit sub-populations JIi, Ih, . . . 77* in which the distributions 
according to .r may differ. This type of case might, for example, arise if we were sampling 
the population of a town for income, there being districts, wards or even streets which are 
known to be inhabited by classes living at different income-levels. 

Practical considerations alone may require that we draw a prescribed portion of the 
sample from each sub-population. For instance, with a town of 500,000 inhabitants it 
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would be most tedious to sample by using random numbers applied to the whole town. 
We should probably divide the work among districts and blocks and select random samples 
within the blocks. This, however, is not to be confused with the division of the town into 
relatively homogeneous districts because of its heterogeneity. Either process is called 
stratification. The problem we shall discuss is this : If we have decided to draw a total 
sample of n members, and can assign at will the number n^ drawn from the ith stratum 
JJp subject to the condition Z (%) = n, how should we choose the numbers or need we 
choose them at all ? . Will our estimate of the mean value of x be better if we merely choose 
n members at random from TI, or can we improve it by controlling the numbers n^ and not 
merely leaving them to chance ? 

25 .8 . Let be the j'th member of the sample from the ith sub -population, and let 
the latter contain a number of members with mean and variance uf. If fi is the 
mean of U we shall have 




1 * 




. (25.1) 




We shall now seek for parameters such that our estimator of [x, say t, is given by 


k 

i=l i==l 


. (25.2) 


that is to say, is a linear estimator in the observed variate-values. We shall seek for that 
estimator which is unbiassed and has minimum variance, i.e. for which 


E it) — iJb 

E {t — E {t) = minimum. 

Substituting from (25.2) and (25.1) in (25.3), we find 

E oSij ^ A* 

% 0 

and since E = [x^ this gives 

For this to be generally true we must have 

ni 

J=1 

a first condition on the .^’s. If is the mean of Ay in the ith set we have 


(25.3) 

(25.4) 


(25.5) 


(25.6) 


A - 


(25.7) 


Now consider (25.4). The variance of t is the sum of k variances, for the samples from 
sub-populations are independent. Consider then the variance of ZX^x^, remembering 
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variance = ^ F }2 
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f ^ hj hk — ^i) (^ik - f^i) 

^ h k 


}. 


j 




N, - 1 


A,- 


N. 


K ~ %) +N,i: (^. - Xi^}. 


This is clearly minimised only if 




0 , 


(25.8) 


(25.9) 


that is, if all the A’s for any sub -population are equal. This is what we should expect on 

intuitive grounds, for there is no reason for weighting the sample members differently in 
the same sub-sample. 

Oiii minimal variance, say v, is then given from (25.8), by summing over i, as ' 


^ ~ N, - 1 


% ^1. 


Ni-l n, 

1 ol N't 

This is a minimum for variations in subject to £ = n if 

d 


(25.10) 


dn,- 


{v —pEn^) = 0, 


where p is an undetermined constant. This yields almost at once 

oim 


n\ oc 




(25.11) 


25.9. If we know the population variances and the numbers this equation 
determines the numbers n .,. ; but in practice it is rather unlikely that we should know the 
variances without knowing the means, in which case we should not have to sample to find 
the mean of the whole population. Our result is not, however, useless. In the first place 
we find for the estimator t 

i j 

= (25.12) 

so that the estimate is a weighted average of the sample means, the weights being propor- 
tional to the population numbers N^, not to the numbers n^. Secondly, without knowing 
the variances (y\ exactly, we may sometimes reach approximations from prior knowledge 
of the populations. Such values, without giving absolute accuracy, will at least represent 
improvements on selecting the ?^’s by chance. 


rf Nn^ 



252 


DESIGN OF SAMPLING INQUIRIES 


25 . 10 . If the numbers are effectively infinite the formulae simplify, and, for 
instance, instead of (25.11) we have 

niOca^Ni, . . . . . . (25.13) 

the sample number varying with the standard deviation in the stratum concerned, as well 
as its number of members. 

25 . 11 . If there is no information available at all about the variances erf the most 
reasonable course in applying (25.11) appears to be to suppose them all equal. In such 
a case, for large we have 

% oc N^, ...... (25.14) 

or the sampling numbers are proportional to the population numbers. This is what we 
might expect on intuitive grounds. If the populations are infinite the are equal, which 
again is in accordance with intuitive ideas. 

25 . 12 . The above will serve as an illustration of the way in which theoretical require- 
ments can influence the scope of an inquiry conducted among an existent population. By 
seeking for an estimator with minimum variance we have been led to expressions deter- 
mining the allocation of sample numbers among the different strata — and incidentally, of 
course, we have derived expressions for the minimum variance, so that the maximum 
possible precision can be ascertained. The fact that some of our results depend on unknown 
constants suggests that in some circumstances it may be worth while conducting a pre- 
liminary or “ pilot ” inquiry in order to estimate the unknowns and hence to iin]irovo the 
precision of the main inquiry which is to follow. The possibilities of such j)ilot surveys 
have yet to be explored, but the technique appears to merit serious investigation. 

25 . 13 . In passing, we may mention one other topic of great practical ini])oi-tance on 
which theory can throw a good deal of light, that of optimum size of a sampling unit. In 
sampling a human population of a town, for instance, need we take individuals as our 
units ? It would be easier to sample households, or streets, or even whole districts ; but 
do we lose anything by this method, and if so, how much ? Furthermore, the grouping of 
individuals into units of larger size sometimes has a peculiar effect on correlations which 
may lead to erroneous conclusions, and a theoretical investigation may be required to safe- 
guard against error. We shall not pursue the subject further here — the sam])ling jiroblem 
would require a book in itself — but the reader who is interested may like to consult some 
of the papers referred to at the end of the chapter. 

The Design of Experiments 

25 . 14 . For an existent population the flexibility of sampling technique is somewliat 
limited. We are given an aggregate of values, some of which are to be extracted for scrutiny, 
and no manipulation of the sampling can tell us more than exists, so to s} 3 eak, already 
inscribed upon the population itself. Consequently the main line of endeavour in such 
cases lies in estimating with the greatest accuracy (which is largely a matter of clioosing 
the right statistics and minimising sampling variability), or in ensuring that sufficient 
material is available to enable the requisite comparisons to be made with significance 
(which is largely a matter of sample size and selecting the most suitable tests of significance). 
Nothing can alter the population, and theory will, as a rule, only react upon the sampling 
process by some such method as has already been exemplified, e.g. in dictating that the 



THE DESIGN OE EXPERIMENTS 


253 


sampling must be random, in stratifying the population before the sampling is carried out, 
and in deciding how limited resources can be expended to the best advantage. 

25.15. For hypothetical populations there are often wider possibilities, for the nature 
of the inquiry may itself determine which populations are to be studied, and the populations 
may, to a certain extent, be set up at will. For instance, if we are interested in an inquiry 
into the relationship between income and size of family in the United Kingdom, the popula- 
tion already exists and we cannot go outside it ; whereas if we wish to discuss the effect 
of a poison on bacterial growth or of a fertiliser on the yield of barley we can not only 
reproduce experimental data ad libit-um but can arrange the inquiry so as to confine it to 
certain populations (e.g., by considering only a given type of bacterium in fixed nutritional 
circumst.ances or at fixed temperatures), or we may extend the domain of consideration as 
far as purely practical limitations will allow (e.g., by growing barley in new surroundings 
or in new climates). This is rather a pretentious way of saying that we may experiment 
in a domain which, within limits, can be assigned at will. The statistician has a much 
greater scope for ingenuity in the design of experiments than in the design of sampling 
inquiries on existent populations because of the greater degree of control over the population 
under examination. 

25.16. In the classical ideal experiment, only the factors under consideration were 
allowed to vary, other conditions being kept as constant as laboratory practice would allow 
—in investigations concerning the relation between resistance and current in an electric 
circuit, for instance, attempts would be made to keep factors such as temperature and 
external magnetic effects strictly constant. It would be recognized that there would be 
residua! ei’T’ors which would affect the exactitude of the results, but these would be measur- 
able on c(u;tain assumptions. 

25.17. Statistical theory can, of course, deal with such cases, but it can also go farther 
and often wishes to do so. In the first place, it frankly admits the existence not only of 
experimental error (in the sense of aberration from a ‘‘ true ” value) but of the much wider 
type of variation which gives rise to frequency-distributions in practice. Instead of isolating 
particular factors for study, it may wish to give full play to the disturbances which arise 
in practice in order to investigate what happens in '' natural ” conditions. For this reason, 
statistical experiments are often complex in the sense that a number of factors are allowed 
to vary simultaneously. 

Secondly, the admission of outside influences which together make up what is generally 
called experimental error implies that it should be possible to estimate the extent of such 
error from the data themselves. We wish to obtain, not the functional relations between 
variables which may only exist under artificial conditions, but the stochastic relations 
observed in practice. 

25.18. The effect of this on experimental design is that the hypothetical population 
we consider is often a rather general one. Taking the case of trials of a new variety of 
barley as an example, we should wish to compare its yields with those of other varieties 
in different soil conditions, with different manorial treatments, in different years (so as to 
get variations in climate), and so on. Furthermore, to obtain estimates of the error due 
to other factors we usually have to replicate the experiment. A great number of inter- 
comparisons fall to be made, and the process of design is essentially that of finding a form 
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of experiment which will permit all these comparisons and yet save as much unnecessary 
labour as possible. 

Orthogonality 

25 . 19 . To reduce the discussion to more concrete terms we will consider the testing 
of a new variety of barley. In order to study its behaviour under different soil conditions 
we will select a number of areas in which barley is grown and choose a block of ground in 
each. This will give us inter-soil comparisons. We will also arrange to carry the experi- 
ment on for a period of years, so that climatic variations may also be compared. The 
other factor in which we are interested is the response to certain manures, which we will 
take to be dung (D), potash {K), nitrogen {N), and phosphates (P). 

Consider any block at any one place in any year. We will decide on certain standard 
quantities of the four manures and assume that for any manure either a dressing of this 
standard amount is to be given, or it is to be withheld. This simplifies the experiment, 
for then every manure either is or is not applied, and our results can be classified by simple 
dichotomies. Of course more complicated experiments can be devised to allow for different 
quantities of fertiliser, but the simpler case will be sufficient for our purposes. 

We have then set up a population which can be classified according to six qualities, 
place, time, and the application of four manures. Our results are intended to show whether 
there is any variation in yield between these conditions and various combinations of tiiem. 
Of course, it does not follow in deductive logic that if there is significant variation from year 
to year in the particular years chosen there will always be temporal or climatic variation ; 
and similarly, if there is significant variation from place to place it does not follow that 
other soil conditions which have not been tested will show a significant variation. To 
arrive at such conclusions we have to perform an ordinary generalisation by induction. 
What we shall say, if significant results appear, is that in the regions tested, or for the years 
tested, there were significant variations, and that it therefore appears likely that soil and 
climate exert a material effect on yield — and we shall maintain this with more or less con- 
fidence according as our experience is wider or narrower. This is the; familiar inductive 
inference which forms the basis of all scientific inquiry. 

25 . 20 . Within any one block we shall wish to study the effect of manurial treatments 
not only separately but in combination. We therefore divide the block into sixteen com- 
partments and treat them, respectively, with no manure, D, /v, N , P, UK, UN, DP, KN, 
KP, NP, KNP, DNP, DKP, DKN and DKNP. Here every possible combination appears 
once and only once. To compare, for instance, the mean yields in the presence or ai)sence 
of dung we add all the eight yields for plots on which no dung was s[)read and conq)are it 
with the sum of the other eight. All the necessary comparisons can bo made. 

Data of this kind are said to be orthogonal. Each possibility arises an ecpial number of 
times. The reason for the use of the word is that such material is orthogonal in the sense 
we have considered in the analysis of variance. We saw in Chapters 2H and 24 that where 
cell-frequencies were equal the analysis was greatly simplified, and that under the custom- 
ary hypotheses the estimates of means were independent. It is not, of course, absolutely 
necessary to have orthogonal data-in fact, we have shown in Chapter 24 how to deal with 
the non-orthogonal case ; but it is evidently a great convenience to be able to arrange 
for orthogonality, and no efficiency is lost by doing so. 
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Replication 

25.21. If, as suggested above, we divide each block into 16 plots and treat each differ- 
ently, the analysis of variance of any block will have 15 degrees of freedom ; and if we 
cannot ignore any of the interactions there will be no residual variance due to “ error ”, 
that is to say we cannot estimate the reliability of our comparisons. All the 15 possible 
independent comparisons may be made, but we cannot decide whether differences are 
significant in the sense that they may be due to the other factors which we have agreed 
to allow to bear on the experiment, such as individual soil differences from plot to plot. 
If we are to estimate such error ” we must give the factors which produce it an oppor- 
tunity of varying. This may be done by replicating the experiment, that is to say, by 
repeating it in the same form. For instance, suppose that we set up four blocks and divide 
each into 16 plots, applying our manurial treatments to each block. Then, assuming that 
there are no significant interactions between blocks and treatments (a matter which we 
can test by examining the interaction terms in the variance-analysis), we shall have 63 
degrees of freedom, of which 15 are assignable to treatments and their interactions and the 
remaining 48 to a residual ” term, the latter providing an estimate of experimental error. 
We have exemplified this process in Chapter 2.3. 


Randomisation 

25.22. Up to this point we have said nothing about the arrangement of our 16 plots 
within the block. Suppose we divide our block into plots of equal size. Is there any 
advantage in allocating the treatments systematically, or is it preferable to assign them 
at random ? 

We shall consider the relative merits of random and systematic arrangements in more 
detail below, but we can announce the general rule now : unless there is some good reason 
to the contrary, it is better to allot the treatments at random. Where possible, chance 
should l)e given full play. 

25.23. The justification for this rule in our present instance can be seen by reference 
to the section on randomised blocks in 23.41. We saw there that by randomising the 
allocation of plots we were able to preserve the s-distribution and hence to validate our 
tests of significance, even where normality in the parent form was not assdmed. The 
process is essentially one of extending our hypothetical population. Instead of considering 
the observed yields as specimens of wdiat might happen in repeated trials of the same variety 
of barley if the same manurial treatments were applied to the same plots, we consider the 
possibles yields in repeated trials if the manurial treatments were applied in all possible 
ways to difCerent plots. Our experiment is systematic in the sense that we prescribe a 
different treatment for each plot ; it is random to the extent that we allot the treatments 
to plots by chance. 

25.24. There is one source of possible confusion here which it is desirable to remove. 
In our agricultural example complications arise because of the physical contiguity of the 
plots, and we shall see below that it is often desirable to eliminate by special designs system- 
atic fertility gradients in the soil. In other classes of experiment where we desire orthogon- 
ality, the members need not be subject to this kind of effect, and often are not. Reverting 
to the example of raw versus pasteurised milk which has already been mentioned, suppose 
we take a simplified case and wish to measure whether the two different milks have different 
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effects on boys and girls. With a class of 40 children, 20 boys and 20 girls, we can proceed 
in several ways. It is obviously useless to give raw milk to all the boys and pasteurised 
milk to all the girls, for then we have no measure of the differential effect, if any, for either 
sex alone. We might toss up in each case and allot raw or pasteurised milk to each child 
by chance ; but this would probably make the data non- orthogonal. To attain orthogon- 
ality, we should allot 10 children to each of the four sub-groups BP, GP, BR, GR (where 
B ~ boy, G = girl, P = pasteurised, R — raw). We then have an analysis of variance — 

Degrees of freedom 

Between sexes ......... 1 

Between milks ......... 1 

Residual (including interactions) . . . . . .37 

Total ........ 39 

This is analogous to a test of a cereal with two fertilisers and 10 replications. 

The question is, how should we allot the children to the four groups ? Their sex, of 
course, is determined, but the nature of the milk they receive is at choice. It is here that 
the randomisation will help. The ten children of a specified sex who receive raw milk 
should be chosen at random from the 20 available. In this instance it might be thought 
that any method would do ; but it is best to avoid the risk of bias. If the children were 
chosen by the teacher he might tend to select the 10 bigger boys or the 10 brighter boys. 
If they were chosen alphabetically, we might get brothers and sisters automatically receiv- 
ing the same treatment ; and so on. The randomisation process avoids all systematic 
effects of this kind and brings us a stage nearer to obtaining an unbiassed answer to our 
questions. 

Sensitivity of a Test 

25.25. In some cases, where the variate is discontinuous, the nature of the test of 
significance which we propose to apply may make a difference to the form of the experiment. 
If we are testing a certain hypothesis which can produce a specified number m of experi- 
mental results which are acceptable as conforming to the hypothesis, whereas other 
hypotheses produce a number n of other results, we clearly want to keep m as small as 
possible compared with n. The ideal case, of course, is that of the “ crucial ” experiment 
in which the hypothesis can only give one result and other hypotheses give a different 
result. The result then proves or disproves the truth of the liypothesis, and no test of 
significance arises. In statistical practice we do not as a general rule perform crucial 
experiments, but we can sometimes design an experiment so that it is more crucial, if the 
■expression be allowed, than alternative methods. 

25.26. Consider, for instance, the case of a cashier who claims to be able to detect 
good money from false at a glance. To test this ability we spread ten coins before him, 
tell him that p are good, and ask him to point them out. What number of good coins p 
should we include among the ten ? 

If the cashier had no power of discrimination and there are p good coins, the proba- 
bility that he would guess right by chance is 


10 
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for the total number of ways of selecting p from 10 is the denominator of this fraction and 
only one of them is right. Now we want to choose p so as to minimise the probability of 

such an event, i.e. so as to maximise 


(■;> 


This is clearly done when p = 5, so that we 

ought to have five good and five bad coins in the set. Any other number would increase 
the probabihty that he might be right by chance and hence decrease the sensitivity of the 
experiment. 


Latin Squares 

25.27. We now proceed to consider a different type of design, which has been freely 
applied in agriculture but may also be applied to other forms of inquiry. Suppose we 
have a variety of barley to test and five different treatments to apply. We will assume 
that replication has been considered necessary and will replicate five times, the same number 
as the treatments. We will then divide our block into 25 plots like a chessboard (though 
the plots may be rectangular and need not be exact squares, provided they are all the same 
size). Each row may be considered a replication of the five treatments, and this itself 
involves the appearance of each treatment once and only once in each row. Can we extend 
the arrangement and ensure that in addition the treatments will occur just once in each 
column ? 

The answer is affirmative, as the following example shows ; — 


A 

B 

C 

D 

E 

B 

0 

A 

E 

D 

C 

E 

D 

A 

B 

D 

A 

E 

B 

C 

E 

D 

B 

C 

A 


An arrangement of this kind is called a “ Latin square ”. It was studied extensively by 
Euler in the eighteenth century, though not of course from the statistical viewpoint. 

25.28. The advantage of this arrangement lies in the fact that it eliminates possible 
correlational effects due to fertility gradients in the soil or accidental circumstances which 
may exercise a patchy ” influence on the whole block. If we could be sure that there 
were no such influences at work, and that the soil was entirely homogeneous in the block, 
it would not matter where the treatments were placed ; but by imposing the restriction 
that no treatment appears more than once in the same row or column we remove at least 
horizontal and vertical gradients from our comparisons. Suppose in fact that there were 
gradients running across the block and down it. When we work out the mean yield of the 
treatment A we shall add together five values, one of each in the various rows and columns. 
Similarly for B, so that a comparison of A and B is not affected by the systematic influences, 
which work equally on both. 

It is not, of course, true that the Latin square arrangement eliminates every effect due 
to soil heterogeneity. There might be systematic effects running diagonally which might 
still remain. It is, however, clear that in removing the effects in two perpendicular direc- 
tions we have substantially improved the comparison of mean yields as compared with 
a systematic arrangement. 


A.S. — VOL. II. 


s 
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25.29. The analysis of variance of a ^ X p Latin square may be carried out in the 
following- form 

Sum of squares df. 

Between rows . . . . p — I 

Between columns . . . . P — ^ 

Between treatments . . . p — I 

Residual . . . . . _ 1) (p _ 2) 

Total . . . . _ (25.16) 

and the four constituent sums are, on the hypothesis of homogeneity, distributed as 
independently. Before proving this result we will consider an example. 

Example 25.1 (from Thomson, Brit. J. Educ. Psych., 1941, 11, 135 ; data by S. D. Nisbet). 

A set of children were divided into four equal groups and each group was given four 
lists of words to test spelling ability. Each list formed one of four different types of test 
which we denote by A, B, C, D. The arrangement of the experiment is shown in the 
following table, together with the total scores of the corresponding groups ; — 


Groups of children 




’ ’ — 






1 

2 

3 

4 

Totals 


A 

B 

G 

D 


1 

81 

41 

44 

53 

219 


D 

A 

B 

(J 


2 

38 

97 

42 

49 

226 

Lists of 

” ■ i 

0 ^ 

D 

A 

B 


words ^ 

31 

43 

67 

36 

177 


B 

C 

D 

A 


4 

57 

33 

43 

81 

214 

Totals 

207 

214 

196 

219 

83() 


For instance, the first group of children had the first list of test A, the second of test 
D, and so on. No group had the same lists as another group, and each list was used exactly 
once. The scores (corresponding to yields in the agricultural case) were in fact the number 
of words spelled wrongly in a prior test but correctly in this test. 

The above table, of course, does not represent anything corresponding to the physical 
layout of an agricultural experiment, but it shows how a similar object can be secured to 
the avoidance of contiguous effects. Since it is possible that some relationship may exist 
between the lists of words and the tests (e.g. by accident one list might be particularly 
unsuitable for a test), we wish to ensure that not only will each group of children have 
each of the four tests, but that no list shall be given more than once and every list at least 
once. This is precisely what the Latin square accomplishes. The fact that the diagonal 
arrangement of the letters is systenaatic does not affect the present inquiry, though in an 
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agricultural experiment 
between treatments. 


a systematic diagonal fertility gradient might affect comparisons 


An analysis of variance on the usual lines gives the following results : 


Sum of Squares. 


Lists (rows) . . . . 

(iroups (columns) . . 

Tests (treatments) 
Residual . 


Totals 



d.f. 

Quotient. 

359-5 

3 

119-83 

74-5 

3 

24-83 

4626-5 

3 

1542-17 

606-6 

6 

101-08 

5667-0 

15 



he diflciences between lists are evidently not-significant, from which we should conclude 
that they appear to be on a par so far as these tests are concerned. The quotient due to 
gi ou{)s indicates that the children are more alike than chance would lead us to expect, but 
not significantly so, for the variance ratio 101 •08/24-83 = 4-1, Vi = 6, = 3, is not signifi- 

cant. On the other hand, the quotient due to tests is very significant, the ratio 
1542-17/ 101-08 = 15-3, = 3, = 6 being beyond the 1-per-cent, point. We conclude 

that there do exist differences between the tests. 


(UrmtrucUmi of Latin Squares 

25 . 30 . 1 he numbers of possible Latin squares of order p is very large for high values 

of p. llicre are, for example, 576 squares of order 4 ; 161,280 squares of order 5 ; 373,248,000 
of order (5 and 61 ,428,210,278,400 of order 7. Up to this order they have been enumerated. 
Although many examples of squares of higher orders are known, the problem of enumeration 
for p / • 8 awaits solution. Details and examples will be found in Fisher and Yates’ 
Sta tistical Tables . 

By interchanging rows and columns the square can always be brought to a form in 
whicih the top row and left-hand column are in the order ABC, etc. It is then said to be 
a standard square For instance, there are four standard squares of the fourth order : — 


A B C D 
B A D G 
a D B A 
I) a A B 


A B C D 
B G D A 
C D A B 
D A B C 


A B C D 
B D A C 
C A JD B 
D C B A 


A B C D 
B A D G 
C D A B 
D C B A 


(25.17) 


From ea.ch of these, 144 ( = 4 ! 3 !) squares may be derived by permuting all columns and 
all rows except the first. (There is no point in permuting the first row, because the result 
would he a repetition of squares already obtained with an interchange of the letters 
A . . // not an essentially different layout.) The total number of squares, as stated 
above, is tlicrefore 4 x 144 = 576. 

It is only necessary to specify the standard squares. To select a Latin square at 
random wc choose a standard form at random and then permute rows and columns at 
random, the randomising process being most conveniently carried out by Sampling 
Numbers. For squares of order 8 or more, where the standard types have not been enumer- 
ated, we can only choose one of those which has, and hence select one at random from a 
restricted set of all possible squares. 
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Analysis of Variance for Latin Squares 

25 . 31 . We must now justify our assertion that the Latin square may be analysed 
in the form (25.16), and that the 2 :-test applies to the variance ratios which arise in the 
analysis. 

For an ordinary two-way classification we have 

^ K-fc - ^ -1- r {x^j, - a;. 2; ^ 

Thus, if is the mean of rows and that of columns in the Latin square, we have, writing 
X for X , 

L {x^, - xy = L {x^ - xV + E {x^ — + r — x^ - x^ + x)^ . (25.18) 

and the three parts on the right are distributed independently as vx^ with p — I, p — 1 and 
(P “ 1) (p — 1) degrees of freedom respectively. 

Now 


E {x^^ x^ Xq x)^ — ^ — x)^ -\- E {x^Q — — x^ — + 2^)^ 

■V ^E {xt — x) {x^^ — — Xt -V . . (25.19) 

where Xf is the mean of treatments. 

Consider the cross-product term in (25.19). The summation takes place over all 
values in the Latin square. Let us confine our attention to the summation for some par- 
ticular treatment. For this summation the factor f is constant. Summation for 
the other factor gives 


E (x^^ — x^ — x^ — Xi + 2x) = pXf. — Ex,. — Ex,, — px^ + 2px . (25.20) 
and since one treatment occurs in each row and column. 


E Xj. = px 
Ex„ = px, 


(25.21) 


and hence the sum (25.20) vanishes. 

Thus the cross-product in (25.19) vanishes also and we have * 

E (x,„ -x)^ =E{x,~-xV +E {Xc -xy +E {Xi - x)^ 

+ ^ {^rc — ~ x„ — x^ + 2xV. . . (25.22) 

This gives us the analysis of the sums of squares, and it only remains to show that the third 
term on the right in (25.22) is independent of the fourth. It will then follow that the four 
terms are distributed independently with p ~ 1, p - 1, p - 1 and (p - 1) (p - 2) degrees 
of freedom. 

The required property of independence can be established directly, but it also follows 
from considerations of symmetry in the Latin square which have an interest of their own. 
We have regarded the square as composed of rows and columns, with treatments allotted 
in a certain way ; but by rearrangement we can equally well regard it as composed of rows 
and treatments with columns allocated in a certain way. For instance, if we take the 
first standard square in (25.17) we may write it : 


Treatment : 

A B G D 

Rows: 1 Ga G3 G4 

2 G^ G, G, 

3 O4 G3 

4 G3 G4 Ga Gi 

where, for instance, treatment A occurs in row 1, column 1 iCf), row 2, column 2 {Gf), and 
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so on. This, of course, is not a physical layout, hut that is ininiaterial for present purposes. 
It follows that since the sum of squares between columns is independent of the residual in 
(25.22), so also is that between treatments. 

The variance analysis then takes the form 


Sum of Squares. 

d.f. 

Rows 

Columns .... 
Treatments . 
Residual . . . . 

Totals . . 

S {Xr — xy 
i: (xc - x)<‘ 

S (xt — xY 

2 (Xfc — Xr — Xc — Xf, 2xY 

p - 1 
p — \ 

p - 1 

(P - 1) (P - 2) 


- 1 


25.32. The above form provides a homogeneity test of the usual kind. If the test 
proves significant of heterogeneity we may, in the usual way, consider the hypothesis that 

^rc ^rc • - . . . (25.24) 

where Crc normally distributed about zero mean. We leave it to the reader to show, as 
in Chapter 23, that in such an event the residual mean square is an unbiassed estimate of 
the variance of C with (p ~ 1) (p — 2) degrees of freedom. 


25.33. As in the case of randomised blocks, it appears that under certain general 
conditions the 2 -distribution is reproduced approximately for fixed values which are per- 
muted in all the permissible ways consistent with the Latin square design. We omit an 
investigation into this result (for which see Welch, 1937) as the algebra is considerably 
more complicated than for randomised blocks. The result has been confirmed by a limited 
number of experiments. 


Omeco-Latin and Orthogonal Squares. 

25.34. If the two squares 


A 

B 

C 

D 

A 

B 

C 

D 

B 

A 

D 

C 

G 

D 

A 

B 

C 

D 

A 

B 

D 

C 

B 

A 

D 

C 

B 

A 

B 

A 

D 

C 


are superposed we have the arrangement — 


AA 

BB 

GO 

DD 

BO 

AD 

DA 

CB 

CD 

DC 

AB 

BA 

DB 

CA 

BD 

AC 


(25.25) 


(25.26) 


in which every possible pair of letters {XY being regarded as different from YX) appears 
just once. Such a pair of squares is said to be orthogonal. The form (25.26) is sometimes 
written with Greek letters instead of the second Roman set ; hence the name of Graeco - 
Latin square. It is also possible to superpose a third factor which we will denote by the 
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numerals 1-4 in such a way that each combination of any pair of types occurs Just 
once, e.g. 

Aal B ^2 C yS D d 4c 

By4Ad3 D(x.2C8l , 

Cd2I)ylA^4:BoiZ ' ' * * 

D ^3 C <x 4 Bdl Ay 2 

Complete sets of orthogonal squares (i.e. those in which there are p — 1 factors for Sb p X p 
square) are known for all prime p and for p = 4, 8 and 9. Curiously, there is no set for 
p — Q. Up to and including p —1 they have been enumerated. 

We shall not enter here into the use of these squares in experimental design. They 
are generalisations of the Latin square in which, by suitable arrangements, several factors 
can be tried out simultaneously, so that all possible combinations of pairs occur an equal 
number of times. 


Confounding 

25.35. It will be evident that if we wish to consider in full a classification according 
to several variates, particularly with replications, the number of individual members in 
the sample may be very large. For instance, if we. wish to test a variety of barley with 
three different applications of four types of fertiliser, there must be 81 yields even without 
replication, if we want to make all the comparisons possible. Physical considerations may 
make a layout of an experiment on such a scale impossible. The difficulty is possibly more 
serious in experiments on expensive animals such as cows. 

Where economy in the size of sample is a very material factor we may be able to reduce 
the sample at the expense of sacrificing some of the less important comparisons. For 
example, to consider once again the case of barley and the effect of fertilisers : we shall 
undoubtedly wish to compare yields of D and not-D, K and not-A, P and not-P, N and 
not-N . We may also wish to compare first-order interactions of the type DK and not-D, K. 
But it is quite possible that interactions of higher order, such as the effect of dung in the 
presence of two other fertilisers, are negligible. Where we are prepared to assume that this 
is so, on the basis of prior evidence or otherwise, we can dispense with certain information 
and still make the comparisons we wish while retaining properties of orthogonality. 


25.36. Consider, as an illustration, an experiment with three fertilisers, each of which 
is applied or not applied, say N, P and K, and four replications. In the ordinary way 
there would be 32 plots and we should have an analysis of variance as follows, assuming 
that block-treatment interactions may be regarded as part of the residual : 


Sum of squares. 
Blocks 
N 
P 
K 

NP . 

NK . 

PK . 

NPK 

Residual 

Total 


d.f. 

3 

1 

1 

1 

1 

1 

1 

1 

21 

31 
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Now suppose that we divide our main blocks into two sub-blocks, the first containing 
the treatments 


0 (None), NP, NK, PK, (25.28) 

and the second the treatments 

N, P, K, NPK (25,29) 

We may then analyse the variance as follows, regarding the sub-blocks as blocks of four 
plots each : — 


Sum of squares d.f. 

Blocks ... . . .7 

N 1 

P 1 

K .1 

NP 1 

NK 1 

PK 1 

Residual ...... 18 

Total 31 


In fact, if we wish to compare the yields with N and those without N, i.e. 

N -1- NPK + NP NK 

with 0 +PK -f P -{■ K, 

it will be seen that we add two members from (25,28) and two from (25.29), so the difference 
is not affected by block differences ; and similarly for the other comparisons. Such a 
design is said to be balanced, and the interaction NKP is confounded with block-differences, 
since in the eight blocks it cannot now be isolated from block effects. The advantage of 
the second design over the first is that, without losing anything appreciable in comparisons 
between treatments, we have gained a good deal in the assessment of block effects ; for the 
residual has only declined from 21 to 18 d.f. whereas the sum of squares between blocks 
has increased from 3 to 7 d.f. 


25.37. The ideas of orthogonality, randomisation, balance and confounding have 
been developed to an advanced degree and with great ingenuity, particularly by Fisher 
and Yates. The slight sketch we have given of the methods in this chapter is intended to 
be no more than illustrative of the way in which the theory of experimental design is capable 
of development, at least in certain fields, and the manner in which efficiency may be imported 
into a practical inquiry by a due regard to theoretical requirements of the design. For a 
comprehensive account of this branch of the subject the reader should consult Fisher’s 
Statistical Methods and Design of Experiments, Yates (19376), and a useful introductory 
account by Goulden (1939). At this point we leave these particular topics and return to 
certain general matters. 

Design and Randomisation 

25.38. Whenever an inference is to be made, and particularly where hypothetical 
populations are concerned, the reader will find it useful to ask himself what precisely is the 
population under consideration. We can illustrate the point very usefully by discussing 
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a subject on ■which there has recently been difference of authoritative opinion — that of 
occasional conflict between the requirements of balancing and randomisation. 

25 . 39 . Consider in the first place the testing of a cereal under two treatments, denoted 
by A and B ; and to simplify matters as much as possible, suppose we are to sow eight 
plots in a straight line. In what order shall we allot the treatments ? 

If the plots are not too large so that the row covers a big area, it is quite possible that 
there may be a trend of fertihty in the soil itself which wiU affect yields differentially and 
hence interfere with comparisons which we might make. Suppose that we do wish to 
guard against a fertility gradient so far as possible. We might then decide on one of the 
“ balanced ” arrangements : 

AABBBBAA (25.30) 

AB B A A B B A (25.31) 

ABABBABA (25.32) 

As will be easily seen, if there is a linear gradient in fertility along the row the means of 
A and B treatments respectively wiU be affected to the same extent and hence their differ- 
ence unaffected. For instance, consider (25.30) and suppose the linear gradient is repre- 
sented by an additive factor q -f- hp, A; = 1 ... 8 . On the hypothesis that the remain- 
ing effect consists of a constant a for A -treatments with a normal residual t, and similarly 
for jB, the yields are 

A-treatments : q+ p + u -Fli, g -f 2p -h a + 1^, g 4- 7p + u -f g + -f 

IB-treatments : g + 3^ 4- 6 4- I 35 g •4 %> 4- & 4- ^ 4 ) g 4- 4 - ^ 4- g 4- Op 4- -f- le 

with means 

i ('^9' 4 18p) 4 « 4 i (1^1 4 I 2 4 ^7 4 ^s) 

i (4g 4 18p) -f 6 4 J (^3 4 4 -f- ^e) 

respectively. The differences of these two are independent of g and p. 

25 . 40 . The alternative procedure in allotting treatments would be to distribute 
them at random. Such balanced arrangements as (25.30)-(25.32) might then arise 'by 
chance. But we might also get such an arrangement as 

AAAABB BB . . . . . (25.33) 

What are we to do in such circumstances ? If we reject this arrangement we are rejecting 
the random allocation of treatments in favour of systematisation. If we accept it we 
know quite well that a fertility gradient, if it exists, will invalidate the inquiry. 

The reader will no doubt agree that, if other things are equal, the balanced arrange- 
ment is better than the arrangement (25.33). What we have to examine is whether other 
things are equal ; in short, whether in rejecting randomisation we have lost anything 
useful in the testing of significance. 

25 . 41 . Consider a rather more general case in which an experimental area is laid 
out in p blocks of g treatments each. If the subscript j refers to blocks and k to treat- 
ments, we have the usual analysis with sum of squares between blocks {p — 1 d.f.), between 
treatments (g — 1 d.f.), and residual ( (gj — i) (g _ i) d.f.). 

Now we have seen that if the indi'vidual plot-yield can be regarded as a block effect 
plus a treatment effect plus a normal residual with constant variance from plot to plot, 
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the significance of treatment effects can be judged from the s-test in the usual way by 
comparing sum of squares between treatments with the residual sum of squares. This 
is true whether treatments are allocated at random or not. 

But suppose we wish to adopt the alternative viewpoint of 23.41 and make the infer- 
ence in the set of values obtained by permuting the observed values. These permutations 
will not affect the block means or the total mean, and hence the sum of squares between 
blocks remains constant. The remaining part of the analysis may be written — 



Sum of Squares. 


d.f. 

Treatment 



q - 1 

Residual .... 

S 2 = ^ (^jk 


ip -i)(q~ 1) 

Totals 

— S {xjk 

to - 1) 


(25.34) 


Rather remarkably, the z-test holds for the ratio 

(p - 1) (g - 1 ) 

q - 1 

provided that treatments are allocated at random, independently of the distribution of 
residual effects in individual plots. 

25.42. Consider, then, the population of values, {q !) in number, obtained by per- 
muting the observed values. The total sum of squares S 3 in (25.34) is the same for all 
members. Consequently if S^ is too great, S 2 must be too small and vice-versa ; and in 

> general, if we confine ourselves to certain layouts and reject others, all the possible values 
of Si cannot appear. It is this fact which has been seized on by advocates of randomisa- 
tion. They point out that for balanced layouts Sj tends to be smaller than for random 
layouts (a conclusion supported by experiment) ; consequently that the test of significance 
is invalidated and the estimate of error 82 too big. The difference between the two modes 
of thought may be expressed briefly in this way : with balanced layouts the real error is 
reduced but the estimate of error is too large, so that the significance of the result is more 
in doubt ; whereas with random layouts the estimate of error is exact but the error itself 
may be larger. The question is whether one prefers to be nearer the truth without knowing 
how near, or farther from the truth with a knowledge of the limits of error. 

& 

25.43. For details of the controversy on this topic the reader may consult the papers 
referred to at the end of the chapter. It brings into prominence an important question 
of inference which can only be decided by the experimenter himself. If he chooses to 
regard any act of experimentation as one of a large population of such acts, to be carried 
out by himself or other workers, he may prefer randomisation in all circumstances, not- 
withstanding that every now and again he will hit by chance on a design which he knows 
is likely to give misleading results. But if he cannot take this very detached attitude (and 
most experimenters, being human, would think it poor compensation that their own errors 
are balanced by the better luck of other people) then he will prefer to design a balanced 
layout, even if the exactitude of his tests of significance is impaired. 
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25.44. We must, Fowever, not leave the reader with the impression that the 
desiderata of both schools of thought are totally incompatible. It frequently happens that 
one can select a design which is both balanced and random. The Latin square is a good 
example. By imposing the restriction that a treatment must not appear more than once 
in a row or column we remove to some extent the interference of fertility gradients ; by 
requiring that it shall appear just once we balance the design ; and by leaving the rest 
of the layout to he determined by a random selection from all possible Latin squares of 
that order we randomise so as to reproduce the distribution of the variance ratio in the 
required form, thus, as “ Student ” remarked, “ conforming to all the principles of allowed 
witchcraft 
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EXERCISES 

25.1. A population is given by specifying the frequencies in comparatively narrow 
ranges of one variate, the frequency in the ith range being and ranges being of equal 
width. Show that if the population frequencies are large, the best estimator of the mean 
of a second variate which is linearly related to the first (in the sense of the unbiassed estimator 
of minimum variance) in a sample obtained by taking members from the ith range is 
given when is proportional to N^. 

25.2. Extend the result of the previous exercise to the case where ranges are of 
unequal width. 

If the number of farms in England and Wales is known in the acreage ranges 0-49, 
50-99, 100-199, 200-499, 500 and over, what sampling proportions would you take in the 
various ranges to estimate the total acreage under wheat ? 
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25.3. If a variate | can be regarded as the sum of a systematic component ^ {x) and 
an uncorrelated random component Si and rj similarly as r] (a;) + £2, and if the random 
components are uncorrelated with each other, show that 

r (I, cov{g (a; ), rj jx)} 

{ (var i (x) + var e^) (var 97 (a?) + var £2)}^ ’ 

Hence, if a population is divided into strata the correlation between ^ and 97 for these strata 
will, in genera], be less than that obtained by combining strata to obtain larger units ; 
and as the strata are further subdivided the correlation between i and 97 tends to zero. 

(Spearman, 1907, Am. J. Psych., 18 ; Wold, 1938a.) 

25.4. Illustrate the effect of the foregoing exercise by calculating the correlation 
coefficients for the data of Table 14.4 (vol. I, p. 333), (a) by adding the variates in pairs 
and so obtaining 24 values ; (6) by repeating the operation and obtaining 12 values ; 
and (c) by repeating the operation and so obtaining 6 values. 

25.5. (Markoff’s theorem.) Consider a sample of n independent values x^ . . ■ x^, 

being drawn from a population with mean and variance cr|. Suppose we have 

a function 0 defined by 

where the 6’s are known and the parameters Pj depend on the /t’s according to the equation 


S 





the a’s also being known. Then an unbiassed estimator of 6, say t, with minimum variance 
may be wiitten — 

Show that the function t is given by substituting for the ^>’s in the expression for 6 the 
functions q given by minimising 



with regard to the g’s considered as independent variables. 

Show further that if this minimum value is the estimated variance of t is 

— ^ £ (A? cfl). 

n — s 

25.6. In a feeding experiment there are given five different foods, each of which is 
available in four grades. It is desired to feed each animal with one grade of each food, 
but only one, so that a comparison may be made of the effect of the different grades of any 
particular food. Use the Graeco-Latin square to show how the feeding can be carried 
out. 
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25.7. A water diviner is to be taken to ten spots and asked to say whether water 
is present below the surface. It is decided to choose five spots where water is known for 
certain to exist and five where it is known not to exist. The order in which the spots are 
to be presented is determined by spinning a coin, heads denoting water and tails not-water. 

The spinning of the coin results in the first five trials giving heads. Would you 
accept this result or spin again ? 

25.8. Show that a Latin square may be regarded as a three-way classification in 
which members are not zero, but p® — members vanish. Derive the analysis of 
variance for the Latin square from this approach and generalise it to the Graeco-Latin 
square. 



CHAPTER 26 


GENERAL THEORY OF SIGNIFICANCE-TESTS— (1 ) 

Hypotheses to be Considered 

26.1. The kind of hypothesis which we test in statistics is more restricted than the 
general scientific hypothesis. It is a scientific hypothesis that every particle of matter 
in the universe attracts every other particle, or that Homer was blind ; but these are not 
hypotheses such as arise for testing from the statistical viewpoint. A review of the various 
tests which have been introduced earlier in this book indicates that the great majority 
specify something about a population. Some merely assert a general fact such as “ the 
population is continuous ” or “ the population is rectangular Others are more definite, 
as for instance “ the population is normal and has a mean ; and again others are less 
definite in one direction and more definite in another, e.g. “ the population has unit vari- 
ance ”. It is also usually a part of the hypothesis that the sample from which the inference 
is being made was obtained by a random process. 

26.2. Suppose we have a set of random variables . . . x^. In the sample space 
W oin dimensions the sample-point whose co-ordinates are ... determines a point 
E, say, with a distribution function which we may write as P {E). If w is any region in 
W, we may derive the probability that E falls in w, say P {E s w). Then we shall say that 
any hypothesis concerning the law P {E sw) is a statistical hypothesis. If it determines 
the law completely we shall call it simple. In the contrary case it is said to be composite. 

For instance, in testing the significance of the mean of a sample of n, it is a statistical 
hypothesis that the parent is normal. This is composite, as also is the hypothesis that 
the parent is normal with mean jjl or the hypothesis that the parent is normal with variance 
<7*^. The hypothesis that the parent is normal with mean p and variance cr^ is simple because 
then the parent is fully determined. 

Example 26.1 

In sampling from a population dichotomised into classes possessing the attributes 
A or not-A, say in proportion m and 2 : (= 1 — w), the sampling distribution is the binomial 
ix + This is completely determined by the value of w, and hence a hypothesis as 
to the value of w is simple. Such, for instance, would be the hypothesis that male and 
female births occur in equal proportions. Similarly, in a multiple classification with pro- 
portions Wi, W 2 , . . . vjg, a simple hypothesis would specify values for all the ro’s ; if only 
one were specified and s were greater than two the hypothesis would be composite. 

In sampling from a bivariate normal population characterised by two means, two 
variances and a correlation, a hypothesis about any one parameter would be composite, 
and similarly for a hypothesis concerning two, three or four parameters. Only if all five 
were specified in addition to the normality of the parent would the hypothesis be simple ; 
and this notwithstanding the fact that the sampling distribution of the means is inde- 
pendent of the other three parameters, and that of the correlation coefficient independent 
of the other four. 
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26.3. A hypothesis which determines the law P{Esw) completely except for v 
parameters is sometimes said to have v degrees of freedom. Such a hypothesis may be 
regarded as an aggregate of simple hypotheses. For instance, the hypothesis that a popula- 
tion is normal with mean fj, is the aggregate, for all a^, of hypotheses that it is normal with 
mean n and variance u®. 

26.4. The kind of argument we have used in testing hypotheses, for both large and 
small samples, is of this character: assuming that the hypothesises true, we can, with 
any assigned probability oc, find a region in the sample space W such that the probability 
of E falling in W-Wq is a. We caU W-w^ the region of acceptance and the complementary 
domain the critical region. (This is the nomenclature of Chapter 19.) If our observed 
E falls in Wg we reject the hypothesis ; if not we accept it. As a rule, in practical cases, 
our regions Wo are determined by the values of some statistic such as x in testing the mean. 


Errors of First and Second Kind 

26.5. In general, as we saw in Chapter 19, there are many possible regions of accept- 
ance for any given hypothesis and any given probability level oc. For all of them we shall 
err in proportion 1 — a of the cases in the long run by rejecting the hypothesis if E falls 
in the critical region — provided that the hypothesis is true. But what about the case when 
it is not true ? We cannot ignore this case, for its possible existence is the very reason for 
carrying out the test. It is of no use whatever to know merely what the test will do when 
the hypothesis is true without regard to its behaviour in the contrary case ; for if we are 
to consider only the events which happen when the hypothesis is true we have no right to 
use a test based on that assumption to reject it. 

By having regard to the behaviour of the test when the hypothesis is not true we are 
able to lay down criteria for choosing among the various tests obeying the rule 

P{E sw,\H^] = l - x, (26.1) 

where JTq is the hypothesis. In fact we shall seek for the test which, while obeying (26.1), 
minimises the risk of accepting when an alternative hypothesis is true and accord- 
ingly is false. That is to say, we shall endeavour to find such that, in addition to (26.1), 
we also have 

. 1 — P {i/ e iTo I Hi) = minimum. .... (26.2) 

26.6. From a slightly different viewpoint we may say that there are two |)ossible 
errors in judging a statistical hypothesis : 

{a) We may reject it when we ought to accept it, that is, when it is true. 

(b) We may accept it when we ought to reject it, that is, when it is false. 

These are known as errors of the first and second Idnd respectively. The error of the 
first kind we can control exactly by setting up the proper region of acceptance determined 
by a. Errors of the second Idnd cannot be controlled in this way, but we can sometimes 
calculate their probabilities, and in any case can try to reduce them to a minimum. This 
is the fundamental idea, first given exj)licit expression by Neyman and E. 8. Pearson, 
which determines most of the work in the present and succeeding chapters. 

26.7. The possibility of finding regions of acceptance obeying (26.2) clearly depends 
on a precise specification of what alternative hypotheses are under consideration. We 
had better emphasise the importance of this point. It is customary to speak, and even, 
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in a loose kind of way, to think of testing a hypothesis without reference to alternatives. 
To take the case of testing for normality, we often say that the hypothesis under test is 
that the population is normal without specifying what other form it might have. The 
reader may say that the alternative he has in mind is merely the negation of the hypothesis, 
namely that the population is not normal. But if so he will find it very difficult — ^in my 
own view impossible — to justify any of his tests on a logical basis. He will calculate certain 
statistics and accept the hypothesis if their values are consonant with the normal values ; 
but it will always be possible to find other populations for which the observed values are 
even closer to expectation. If agreement between theoretical and observed values is the 
criterion he should reject normality in favour of these alternative hypotheses. It is not 
until he specifies his alternatives and considers errors of the second kind that some firm 
foundation for intuitive processes begins to appear. 

26.8. Perhaps it may help to clarify the fundamental concepts of the present approach 



if we consider a simple illustration where the hypothesis under test Ho is simple and there 
is only one alternative //i which is also simple. In Fig. 26.1 we show diagrammatically the 
scatter of sample-points which would arise in samples of two, Xi and Xo, the cluster on the 
right being that due to Ho and the one on the left to lix. In practice, of course, the sampling 
distributions are more usually continuous, but the dots will indicate roughly the condensation 
of sample density round central values. 

In determining the critical region we have to find an area in the (a^i, Xa) plane such that 
its “ content ” is 1 — a. Two possible areas are shown, Wo being the area to the left of 
the line PQ, and the area between the lines AB and BG. In either case the proportion 
in the critical regions of the frequency on hypothesis Ho is 1 — a, and if we reject Ho when- 
ever the sample-point falls in Wo (and similarly for w[^) we shall commit an error of the first 
kind in proportion 1 — a of the cases in the long run. 

Consider errors of the second kind. By using the region Wo we should reject Ho — and 
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therefore accept Hx — every time the sample-point arose from Hi, that is to say in practically 
all the cases where Hx was true, since nearly all the sample-points arising from Hx lie in 
Wa. Errors of the second kind are therefore very rare. On the other hand, if we were to 
use uo'q we should accept Hq every time a sample-point arose from Hx but did not fall between 
the lines AB and BO, that is to say fairly frequently. Clearly Wo is the better critical 
region and has a much smaller error of the second kind than Wq. 

26 . 9 . It is to be noted that the argument does not depend on the relative frequencies 
of occurrence of the hypotheses Ho and Hj. This is generally true. There is no concealed 
form of Bayes’ postulate in this approach. 

26 . 10 . When there are n variates and ^ unknown parameters the geometrical repre- 
sentation can be extended by imagining a sample-space W oi n dimensions adjoined to 
a parameter space of dimensions. We cannot draw a picture of such a case on a two- 
dimensional sheet of paper, but the geometrical imagery and terminology of the method 
are frequently useful. A graphical illustration of a two-dimensional sample-space and 
a one-dimensional parameter space has already been given in Fig. 19.3. 

The Power Function 

26 . 11 . If for a simple hypothesis Ho, (26.1) is true we define 

P {E ewxx\ Hx] = ^ {Hx\w^) .... (26.3) 

ns the power of the critical region Wq with respect to Hx- Clearly the power is greatest 
when the probability of an error of the second kind is least. 

In the expression on the left of (26.3) we regard the probability that E falls in as 
dependent on Hx, the hypothesis alternative to Hq. In the expression on the right we have 
regard to the power of the test for Hx as dependent on Wq. 

If there exists a particular region w^ with greater power than any other region obeying 
(26.1) we shall say that it is the best critical region, and the test based on it will be called 
the most powerful test. 

26 . 12 . We proceed to consider in turn the following cases : — 

{a) Ho simple ; one alternative Hi which is simple. 

(6) Ho simple ; an alternative Hi which is composite but can be regarded as ati aggregate 
of simple alternatives. 

(c) Ho and Hi composite but expressible as aggregates of simple hyjiotheses. 

^ Simple Hypotheses : One Simple Altertiative 

26 . 13 . Suppose the parent population is continuous, so that the simultaneous dis- 
tribution of the n sample values Xx . . . is continuous ; and let the frequency functions 

of the sample values on hypotheses Ho and Hi be po (a^i • • • .r,, ) and (xi . . . icj respect- 
ively. ‘Write dx for the element dxx - . . dx.,,. Then we have 

\ p^dx === I — cf. (26.4) 

J Wq 

.and wish to maximise, for variations in the domain w^, the integral 

I px dx. 

JWa 


(26.5) 
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This is a problem in the Calculus of Variations and is equivalent to 
ditionally the integral 

or, what is the same thing, to minimising 


maximising uncon- 
. (26.6) 


{po — kpi)dx, (26.7) 

J Wo 

where /c is a constant to be determined by (26.4). 

It is known that the condition for a stationary value of (26.7) is that, on the 
boundary of Wq, ’ 

j3o - = 0. . . . . . (26.8) 

If the solution is a minimum we have, inside Wg, 

Po <kpi . . . . . . (26.9) 

and outside Wg, 


Po > kp^ (26.10) 

This solution to the problem is fairly obvious on general grounds. If Z7 is a function which 
is sometimes positive and sometimes negative, with a line of demarcation where it is zero 

(as must exist in virtue of continuity), we clearly minimise J U dxhy taking into the region 

'Wg all the points for which U is negative and no more. This gives us (26.9) and (26.10), 
and the boundary of Wg is the locus for which U vanishes. By convention we regard the 
boundary as included in Wg, which accounts for the equality in (26.9) and its absence in 
(26.10). 


26.14. The conditions expressed by (26.8), (26.9) and (26.10) are sufhcient as well 
as necessary. For let be any other region for which 

1 Po dx ~ 1 — on. 

If Wg and Wi have a common part denote it by Wg^^. Then 


I Po dx — 1 — a — I pgdx 

JWo-U\x j Wot 


and hence, from (26.9) 


Po dx 

J Wx— Wqi 


^ Pidx > \ podx ^ \ Po 

J Wq — Woi J J Wi — Wqi 

Px dx. 

J Wt—Wot 


dx 


> k 


Adding to both sides A: px dx, we have 

JttJoi 


A: pxdx > k \ Px dx, 

J Wa J Wx 


(26.11) 


and hence, for positive k, the power of Wx is less than that of Wg and the latter is the best 
critical region* 

A.S. — VOL. n. 


T 
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Both in this section and imphcitly in the last we have required h to be positive. That 
it must be so if is to exist emerges from (26.8), for and essentially not negative, 

and if k were negative no solution for real variate-values would exist. 


Example 26.2 

Consider the normal population 

1 


dF 


Vi^Tt) 


exp { — (a; — /i)^} dx. 


00 < a? < 00. 


Let the hypothesis Ho be that p = ao, and the alternative that p — a^. We have 

1 f ^ 

Po = ^ exp J - i y (Xj - ao)2 

{2jr)2 I 

We can conveniently express this in terms of the sample mean x and the sample variance 
s% obtaining for the density function 


— ~ ^ ' 

A similar expression is found for p^ and thus, for the boundaries of the best critical region, 
we have 


Po 


n 


exp 


1 

k Po 


exp 


n 


{ (« — o,)“ — (* 




exp 


n 


^ Cf/Q 


This 3?ields for the critical region 


or 


(ao — aO {2x — ao — a^) < ~ log k, 

n 


(ao — ai) X < (a^ — a^) + - log k — [a^ — a^) Xq, say. 


n 


If ai < Uo the region is then defined by 


.r < ;r 


Oj 


but if Uj > tto it is defined by 

X > Xq. 

The reader should compare the two cases on a diagram similar to that of Fig. 26.1. 
Example 26. S 

Consider again the normal population when the mean is known, say zero, but the 
variance unknown, e.g. — 

-00<X<00. 
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We now find, for hypotheses <7 = Oo and a 

k = ^ = 

Pi 

which yields, for the hest critical region, 


Vo/ 


exp 


= Ct 

I + «") 




{x^ + - fff) < ?^log j ^ f-) 

I \ O’ 1 / 


n] 


n 


Thus our critical regions are defined by 


<v{(yl — erf), say. 






if CTj < (To 
m 2 = + 5^ > V if cTi > Uo 

The best critical regions in the space W are thus bounded by hyperspheres centred at the 
origin. Whether we take the space inside or the space outside a particular hypersphere 
as the critical region depends on the alternative hypothesis. The probabilities concerned 
can be evaluated directly without evaluating the constants k and v. In fact, the proha- 

bility of exceeding a given value of — ^ ~ — xl obtainable from the ;j|j^-dis- 

o'q 0^5 

tribution with n degrees of freedom, and hence the relation between v and ot can be 
ascertained from the ;^^-integral. 

In this particular case we may find without difficulty the power of an alternative test 
which would suggest itself on intuitive grounds. Suppose we find ^ 2 — xi from 

(To 

the ^-distribution corresponding to n — 1 degrees of freedom and probability level a, 
and use, instead of the hyperspheres centred at the origin, those centred at the sample mean 




c» 2 


V 


Suppose that the alternative is that erf = 1-1 crf|. In testing Ho for the alternative 
Ox > (To we should, for the test based on v, find xf) ^^rid accept cTq if 


7mL, .. 

»> Xii* 

For instance, with n -= 5, 1 - a O-Ol we find xt) 15-086. The probability of an error 
of the second kind is 

r rzoVii 

Px dx = dF (x‘-^), 

J Wo j c 

2 

i.e. is obtained from the ;^^-integral with argument = 13-71, giving /3 (Hx | Wo) == 0-018. 


On the other hand, had we used Xi instead of xl we should have entered the table with 
four degrees of freedom, giving 13-277. Divided by 1-1 this gives 12-07, resulting in a 
probability of rather less than 0-017. This is the power of the second test and is lower 
than that of the first test, as of course it must be since the latter has maximum power. 


Simple Hypotheses : Families of Simple Alternatives 

26.15. Consider now the case where Ho is simple but Hx is composite and consists 
of a family of simple alternatives. The most frequently occurring case is the one in which 
we have a class of simple hypotheses Q of which Ho is one and Hx comprises the remainder ; 
for example, the hypothesis Ho may be that a mean has some value po and the hypothesis 
Hx that it has some other value unspecified. 
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For each of these other values we may apply the foregoing results and find for each x 
corresponding to any particular member of Hi, say Hi, a best critical region But this 
region in general will vary from one to another. We obviously cannot determine a 
different region for all the unspecified possibilities and are therefore led to inquire whether 
there exists, among the family of best critical regions w^, one which is the best for all of 
them. Such a region is called the Uniformly Most Powerful and the test based on it the 
Uniformly Most Powerful test, conveniently shortened to U.M.P. test. 

26 . 16 . Unfortunately, as we shall find below, the U.M.P. test does not usually 
exist unless we restrict our family D in certain ways. Consider, for instance, the case 
dealt with in Example 26.2. We found there that for ai < ao the best critical region for 
a simple alternative was defined by 

X < ^ 0 - 

Now the boundaries of the regions determined by ^ = constant do not depend on and 
can be found directly from the sampling distribution of x when the probabihty level 1 — a 
is given. Consequently the regions defined by ^ <Xo are the same for all and hence 

the test is U.M.P . for the class of hypotheses that < ao. It is difficult to see how a better 
test could be devised, for, whatever Ui subject to Ui < ao, the test controls errors of the first 
kind and minimises those of the second. 

However, if a^ > ao the best critical regions are defined by x ^ Xq. Here again, if 
our class Q is confined to the values of ai greater than ao the test is U.M.P. But if ai can 
be either greater or less than ao no U.M.P. test is possible. The reader will easily verify 
for himself that the same is true for the test considered in Example 26.3. 

26 . 17 . We now show formally that for a simple hypothesis depending on do~the 
value taken by the parameter 6 defining a family of alternatives— no U.M.P. test exists 
for both positive and negative values of 0 — do if the frequency function p {E\0) is con- 
tinuous, has everywhere a continuous derivative with respect to d which does not vanish 
identically, and admits of differentiation under the sign of integration over W. 

Suppose that such a test does exist. Then for any 0 we have, inside iVo 

po <Jcp, 

which we may write 

p{E\d)>h (d) po {E I do) (26.12) 

Likewise, for any point E on the boundary of Wo we have 

p{E\e) ^}i{d)po{E \do) (26.13) 

h57pothesis p is differentiable in d and hence so is Ti. Moreover, as d -> do A (d) -> 1 
Hence if 

d = d - do 

and primes denote differentiation with respect to d, we have 

h {6) = I A 0 < g < 1 

= 1 + Zl ri 

Iddpo {E I dojJe,-f-«A 

= .... ( 26 . 14 ) 
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Further we have 

(£ I 0) = f>. (JS I 8.) + zl [i)' 1 0<r<l. . .(28.16) 

Substituting in (26.12) from (26.14) and (26.15), we find 

t I Oo) J 

This is true for any E and E and for all A, whatever its sign, and hence the expression in 
curly brackets vanishes. Thus we have 

[i.'(B|0)]..-?ii|i^>[2)'(^|9)],. =0. . '. . (26.17) 

Po {E i ^o) 

Similarly this equation may be shown to hold outside Wo, and hence it is true throughout W. 
Now we have 


p {E \0) dx — 1, 
w 


and hence, differentiating with respect to d and putting d = do, 


f lp'{E\d)\dx==0, 
J w 


Substituting from (26.17), we have 

Po {E I Bo) r, 


LsTSir.)''''’'*”'"-*' 


and hence 


Thus, from (26.17) 


0. 


W (S I 0) k 

Po {E I Oo) 

[p' {E I e)\ 0. 


. (26.18) 
. (26.19) 


But this implies that the derivative of p with respect to 0 is identically zero at Bo, which 
is contrary to hypothesis. The theorem follows. 

It may be noted that in deriving (26.17) from (26.16) we used the property that A 
may have either sign. If it can have only one sign, that is, if our class of admissible alter- 
natives is confined to the case when either 0 < Oq or B > Oo, U.M.P. test may exist ; and 
so we found in Examples 26.2 and 26.3. 

Best Critical Regions and Likelihood 

26.18. Since on the boundary of a best critical region we have po — kpi — 0, that 
boundary is determined by the condition that on it the ratio of the likelihoods of two 
functions corresponding to Ho and Hj is constant. 

Consider now the case where comprises a set of alternatives varying according to 
the parameter 6, Ho being one of them. In accordance with the principle of maximum 
likelihood we should obtain, as the most likely value of 0, the solution of 
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where S is then expressed as a functiop. of the variables. If this value is substituted in 
p, we obtain the distribution with greatest likelihood which may be written p (Q max.). 
The surfaces of constant likelihood are defined for this distribution by 

Pq ■— Xp {Q max.) =0. .... (26.21) 

Now these surfaces are, in fact, the envelopes of the family, varying with 0, 

Po ~ kpf) = 0, (26.22) 

for to obtain the envelope we differentiate with respect to 0, giving ^ ~ 0 and eliminate 0, 

leading back to (26.21). Thus, if there exists a best critical region (and hence a U.M.P. 
test) for all permissible alternatives Hq, such a region will be the envelope with respect to 
such alternatives and will therefore be identical wdth a region defined by (26.21) ; and 
hence a test based on the principle of likelihood leads to best critical regions, if they exist. 

If, as is more usual, there is no common best critical region, the ratio of the likelihood 
of Ho to that of any particular Ho is k. The surface (26.21) remains the envelope of the 
family of surfaces (26.22) for which k = X. 


Example 26.4 

Consider once again the normal form, where both mean // and variance cr- are specified 
and the admissible alternatives are that they can have any values, subject of course to the 
variance being positive. For any given and Oi the best critical region will be given by — 



This may be written in the form 


n { (x ~ pY H- <s^} > constant 

or;o-5 

where 

n = 1 

Thus, if C7i > Co we have 

{x — pY + say ; 

and if Oi < <To we have 

{x — pY -h < v'^. 

For any specified and the best critical regions are bounded by hyperspheres with radius 
v-s/n and centre = x^ = ■ . . = = p. Owing to the fact that p varies with pi and 
Ui, there will not in general be a best common critical region and a U.M.P. test ; and this 
remains true even if we limit our alternatives to Oi < Uq and Pi <. Po or by similar 
inequalities. 

We may regard x and s as independent variables and represent the data on a two- 
way plane {x, s). The best critical regions are then seen to be bounded by circles with 


Cfl 
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centre {p, 0) and radius v. Fig. 26.2 (adapted from Neyman and Pearson, 1933c) illustrates 
some of the contours for particular cases. A single curve, corresponding to a single proba- 
bility level, is shown in each case. 

Cases (1) and (2) : ai = Co and p = ± oo. The best critical region lies on the right 
of the line (1) if > /.iq and on the left of (2) if < /liq. This is the case discussed in 
Example 26.2. 

Case (3) : fTi < Co, say Then p = yo | — yo) and the region lies 

inside the semicircle marked (3). 

Case (4) : cTi < Uo and yj = yo. The region is inside the semicircle (4). 

Case (5) : Ui > cto and y^ = yo- The region is outside the semicircle (5). 

There is evidently no common best critical region for these cases. The regions of 



acceptance, however, may have a common part, centred round the value (yo, Uo), 
should cx|)(‘ct them to do so. Let us find the envelope of the best critical regions, which 
is, of course, tlie same as that of the regions of acceptance. The likelihood ratio is 

rns^/1 1\ nUx-fioV / ^ 


The partial differentials with respect to yi and Ui equated to zero give 


nix 


whence we find iix = 


-- - .^i) = 

X and Ox = s and the envelope is 
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The dotted curve in Fig. 26.2 shows one such envelope. It touches the boundaries of all 
the critical regions which have the same hkelihood-ratio h. The space inside may be 
regarded as a “ good ” region of acceptance and the space outside accordingly as a good 
critical region. 

There is no best region for all alternatives, but the regions determined by envelopes 
of likelihood-ratio regions effect a sort of compromise by picking out and amalgamating 
parts of critical regions which are best for individual alternatives. 

Example 26.5 

In the previous example we have supposed that the sample space W was the same for 
all admissible alternatives. This is quite legitimate, for we can always regard the domain 
of variation as infinite by supposing that p = 0 outside the range of the frequency-distri- 
bution of the variates. In the normal case, of course, p does not vanish anywhere, so that 
we are compelled to consider W as infinite. 

When, however, the sample-space for non-vanishing p is bounded, special circum- 
stances may arise, and it is occasionally necessary to consider separately the different 
discriminating regions. For instance, if the sample-spaces corresponding to Ho and 
are and Wi, it may happen that Tfo and Wi have no common part when both po and 
Pi are greater than zero. If so, we can distinguish between Ho and Hi with certainty. 
If there is a common region W oi then Wi — Woi should be included in the best critical 
region, for to do so reduces the probability of errors of the first kind. But it does not follow 
that this should constitute the whole of the critical region, for we might then commit too 
many errors of the second kind, i.e. accept Ho too often when Hi is true. We may then 
wish to add to Wi — TToi a region Woo, making Wo altogether, such that Woo lies inside Tfoi 
and Po {E s Woo) — Po{E s Wo) — 1 — a. This controls the first kind of error to level oc 
and reduces the second kind of error. 

Consider the population 

. 1 

i> {») = ■^, a — < a; < a -f 

= 0, elsewhere. 

Suppose a sample of n to have been drawn from a population of this kind where h is known. 
We wish to test whether a has some value Uo as against the alternative ai. 

The sample-spaces Wo and Wi are hypercubes centred at Uo and ai. If they have 
a common part Woi the probabilities po and pi in that part are both proportional to the 
volume and po/Pi = 1 everywhere in the region. If, then, we take any region m;„o of con- 
tent 1 a in TFoi and add it to ITi — Wqi we get a best critical region, and there are clearly 
infinitely many such. 

For the admissible alternatives the hypercube Wi will move along the long chagonal 
Xi —Xz = . . . = as ai varies, and we cannot always find a common region of size 1 — a 

to form 2 ^ 00 - Ry taking such a region as a hypercube of side 6 (1 — oc)”, however, fitted 
into one of the corners of Wo lying on the long diagonal, we “ nearly ” obtain such an object 
since this region provides what is required so long as IFo and Wi have a common part of 
content 1 — a. Which corner we choose depends on whether the hypothesis is Xi > «« 
or Xo > «!. 
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Relation between TJ .M.P. Tests and Sufficient Estimators 

26.19. It was thought at one time that the existence of a set of U.M.P. tests for 
a continuous range of admissible alternatives involved the existence of a sufficient estimator 
for the parameter concerned. This does not appear to be true in full generality, but is 
so in nearly all the cases occurring in statistical practice. We will prove a theorem on the 
subject : — 

If a system of U.M.P. tests exists and if any point in the sample-space lies on the 
boundary of a best critical region, then a sufficient estimator exists for the parameter whose 
variation provides the admissible alternatives.* 

It is enough to show that for an arbitrary point we have 


{E) = h {t, 6)po{E) (26.23) 


for then t is sufficient for 6 by definition. Now we know that on the boundary of a critical 
region we have 

Pi {E) _ 1 
Po {E~) k 


li, say. 


where h varies with the a:’s and with 0. We show that h has the form h {t, 0) by defining 
a function t and showing that if t has the same value at any two points Ei and E^, then 


for all 0. 


Pi (El) _ Pi (Jga) 
Po (^i) Po (^2) 


26.20. For this purpose we require a lemma to the following effect : if a set of U.M.P. 
tests exists, it will be said to be ordered if the condition ai > implies that the critical 
region w (ai) is included in the region w (aa) ; and if a set of U.M.P. tests exists but is not 
ordered we can always find another set which is. 

w (ai) and w ( 0 . 2 ) may include parts of W where p vanishes. Let the remaining parts 
be V (ai) and v {a^) and, if Vo is the common part of these regions, write 


V (ai) = \ 

V (aa) = Vo + v" J 


(26.24) 


where Vg, v' and v" have no common points. Now for any value of 6 and for any E inw (ai) 
— and therefore in v ' — there is an such that 


Pi (E) > hipo (E) in v' 

< hi Po (E) outside, and therefore in v". 

Similarly, within w (a.^) and hence within v" we have an such that 

Pi {E) > h^Po (E) in v" 

< hzPo {E) in v'. 


It follows that, from the inequalities deriving from v", > hz, and similarly, from v', 

hz > hi- Hence hx = hz = h, say, and 

p, {E)^hpo{E) (26.25) 

within v' and v" for any 0. 


* The theorem remains true if there is a set of points of measure zero for which the condition as to 
boundaries is not fulfilled. It is also true for several parameters, as may be seen by an easy generali- 
sation of the argument. See Neyman and Pearson (1936a). 
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Now take 
such that 


u (ai) = «o + v"' 



P^dx — \ — Ki- 


(26.26) 

(26.27) 


This is always possible, for the integral of pa over Vq -f- v” is 1 — aa, which is greater than 
1 — ai. It follows from (26.27) and the first equation of (26.24) that 


Now put 



w' (ai) = W a U (ai) = Wo Vq v'", 


(26.28) 


where Wo is the part of W for which po — 0. Then from (26.27) 

1 Po dx = I — oci- 

J w' (ai) 

Further, w' (aO is a best critical region with respect to admissible alternatives, for (26.25) 
and (26.28) imply that 

1 Pidx — \ px dx, 

J V'" J V' 

and hence 

Pxdx = \ Px dx. 

J w' (ai) J V (ai) 

Finally, w' (ai) is wholly included in w (aa). 

We have therefore replaced the region w (ai) by another region w' {olx) with the same 
properties except that it is included in w (aa). The lemma follows. 


26.21 . To return now to the main proposition, let E be any point of W. If it belongs 
to only one boundary of a best critical region with content 1— a we put t{E) — 1 — a. 
If it belongs to more than one, we put t{E) equal to the mean between the upper and lower 
bounds of values of 1 — a for which the boundaries include E. In virtue of the lemma, 
this implies that whatever the value of 1 — a between these bounds, the corresponding 
boundary must contain E. 

Thus t is defined everywhere. Further, if it has the same value at two points Ex and 
Ez these points must lie on the same boundary. It follows that on this boundary 

Pi i^i) __ Pi 
Po {Ex) Po (^ 2 ) 


and hence the theorem is proved. 

The converse is not generally true, but one has to exercise some ingenuity and import 
some artificiality to construct examples where it fails. Cf. Exercises 26.3 and 26.4. 


Composite Hypotheses 


26.22. We shall consider a class Q of admissible hypotheses depending on r -|- 5 
parameters Bx . . . . . . 6^+^ and shall regard the hypothesis Ho under test as one of 

this class. A composite hypothesis of r degrees of freedom is one for which s of the para- 
meters, say are specified, the hypotheses determining the distribution 

apart from the unspecified parameters. For example, the hypothesis that a population 
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is normal with, specified mean, nothing being supposed about the variance, is a composite 
hypothesis of one degree of freedom. It will be assumed that any admissible simple alter- 
native is given by specifying the other r parameters and that there is a common 

sample-space W for all such alternatives. 

Regions Similar to the Sample Space 

26.23. In order to test the composite hypothesis Ho we need in the first place to 
control errors of the first kind by determining a critical region w, such that 

{ Po dx — I — a. . . . . . (26.29) 

J W 

This, however, differs from the simple case in that po can vary according to the unknown 
parameters, and to be certain of controlling the error we must be able to find w such that 
(26.29) is true whatever . . . 6^. If this can be done we shall call the region w similar 
to the sample-space W and shall speak of 1 — a as its size. 

The problem of testing composite hypotheses then becomes one of (a) finding the 
similar regions, and (6) selecting from among those regions the one which minimises the 
second kind of error for a simple admissible alternative H^. If this is the same for all 
Hf we shall have a common best critical region. 


26.24. We consider in the first place the composite hypothesis with one degree of 
freedom. The general problem of finding similar regions in such a case has not been solved, 
but a solution is possible in one important class of case, namely, that for which 

(a) is indefinitely differentiable with respect to Bi for almost all values of Bi, 

{!)) the function po obeys the relation 

cfy' = A Bcf>, (26.30) 

where 


(26.31) 


<!> == log J-., <!>' = 3 ^^ (26-31) 

and A and B depend on 0^ but not on the ;r’s. In particular the normal distribution 
is of this type. 

Under conditions (a) and (b) it follows that for w to be similar to W it is necessary and 
sufficient that 

f S'" Po ,7 7 . _ I -> (2B.32^ 




■ dx — 0, 


A: - 1, 2, . . . 


. (26.32) 


Let w) be a region for which (26.32) is true. Then for Ic — I and 2 we have 

I Po cf> dx = 0 

J W 

f Po (</>'^ + <l>') dx = 0. 

J 

In virtue of (26.30), this last may be written 

(* Po {<i>“ ~\~ A Bcf)) dx = 0, 

J lU 


whence 


1 Po dx " — A \ Po dx 
J IV J IV 


A {I — a). 


. (26.33) 
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Differentiating (26.33) witli respect to 6x and using previous results, we find 

[ = {2AB — A') (1 — a), . . . . (26.34) 

J W 

and generally 

[ po 4>^dx = (1 - a) (00, (26.35) 

J W 

where (dx) is a fanction of only, and is therefore independent of w. Now (26.32) is 
true for W w, and we find 

[ Po dx = (0i), (26.36) 

JW 

so that 



. (26.37) 


Now consider the random variable Since po integrated through w is equal to 1 — a, 

we may regard — ^ ^ as a frequency function defined in w. It follows from (26.37) that 

the moments of ^ in this domain are the same as those of ^ in W. Consequently, if the 
moments determine the distribution uniquely, the distributions of (f> are identical. 

Hence we may use the hypersurfaces ^ = constant to set up similar regions. The 
space W may be imagined as composed of shells of infinite thinness bounded by these 
hypersurfaces. If we determine an “ area ” on one of these shells equal to 1 — a times 
its area in W , the totafity of such areas will constitute a region w of size 1 ~ a ; and since 
this will be so irrespective of di the region w is similar to W. 


26.25. When similar regions are determined by the above method we have to find 
the best critical region from among them. Let be a simple admissible alternative. 
We require to find from the regions w a region Wq such that 

1 Pidx = maximum. ..... (26.38) 

J Wo 

We now show that this is equivalent to maximising 


subject to 


Ptdw{<f>), 

J w {<!>) 

I Po dw (^) = (1 — a) f Po dW (^). 
Jw(4>) ‘ Jw (i) 


(26.39) 

(26.40) 


Here w (<^) means the element of w for constant — the “ shell ” of the previous section. 
The object of this is to reduce our present case to that of simple hypotheses. We take 
^ as a new variable and consider together the remaining variables (which amounts to deter- 
mining similarity of w and JV in each separate shell between cf) and -j- dej), as in the previous 
section), and are thus left with regions dependent on cj>. Equation (26.39) then requires 
that the probabihty of the second kind of error in each shell must be a minimum, subject 
to the control of the first kind asserted by (26.40). 
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Suppose that (26.39) were not maximised. There would then exist a set of values of 
<f> for each of which we could determine a region v {^) such that 


and 


f Po dv (<^) = (1 — a) f Po dW (<^) 
JvW Jwm 

1 Pidv{cf>)> Pidwo(<f>). 

J V (4>) J W 


. (26.41) 


. (26.42) 


Let E he this set of values of and CE the remaining set. We prove our result by obtain- 
ing a contradiction, namely by defining a region v which is similar to W, and such that 


I ptdx> \ Pfdx, 
J V J Wo 


. (26.43) 


which contradicts (26.38). 

Take as v the shells of hypersurfaces (1) in CE which are identical with Wo ((f>) and 
(2) in E which satisfy (26.42). Now 


and 

Hence 


\ Ptdx dj>\ pt dv {^) 

\ p^dx = \ d<j}\ Pi dwo {(f>). 

J Wo J J 'Wq {(!>) 


{ Ptdx — \ pfdx — \ d<f> W Pt dv (4>) ~ I Pt *^^0 (4>) f 

J V J uh J jE/4-C/t; LJ V (<^) J Wo w J 

= f dcji {{ Pidv {4>) — \ Pt dwo (<^)| > 0, 
J E LJ V {4) J Wo W) J 


(26.44) 


which is the contradiction required. 


26.26. Thus our problem is reduced to that of finding, in the shells ^ = constant, 
portions Wq (</>) which maximise the integral of Pf. We have, so to speak, brought the 
problem down one dimension by locating it in shells instead of dealing with it throughout 
the spaces w and W. It now becomes that of a simple hypothesis in (n — 1) dimensions, 
and the best critical region is the one for which 


1 

Pt > -^ Po, 


(26.45) 


where Ic is a function of <f>. The sum of these regions for the various values of <f) gives us 
the complete solution to the problem, and if this sum has boundaries which are independent 
of Hi we have a common best critical region and a U.M.P. test. 


Example 26.6 : “ Student’s ” Hypothesis 

A single sample is taken from a normal population 

with unspecified o. We have then one degree of freedom, = a, and the hypothesis H^ 
is that y = po, say. 
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We find 




n . E {x 

-V 




a 


=— _ 3 — ytfp)^ 

da a^ a^ 


2n 3 ^ 
a^ a 



3^^ 


{{x — + s^}. 


Condition (26.30) is satisfied, and cj) is constant over the hypersurfaces 


E (x — [IqY = n{{x — + 5^} = constant. 

The hypersurfaces are hyperspheres in PF. To construct a similar region we have merely 
to pick out a region of size 1 — a on each shell and to amalgamate them. In our present 
case this is particularly easy because po is constant over the shells and we need only pick 
out areas on each shell hearing to the area of the hypersphere the ratio 1 — a. 

These areas need not be of the same shape or similarly situated. By selecting them 
in different ways an infinite variety of regions may be constructed. We have to find the 
best for an alternative simple hypothesis a = a-^, fi = /Xi. 

The condition (26.45) becomes 


• exp 


n 

2^ 




JL 

ka^ 


> - — exp 


n 


{{x — ^o)^ + 


As we are dealing with regions which are similar with regard to a, we may put a 
and find 

X (^1 — ^o) > I (/wf — (tfo) ^ = (/“i — y“o) say, 




n 


where ibi = /ci Thus we find, for the boundary of (cf)), 

if > //o, X > k-L {^) 

where k-,_ has to be chosen so as to satisfy 

[ podw {<!>) = {1 -oi)[ podW {<!>). 

Jw{4>} J rF(^) 

Thus on any particular shell the “ cap ” cut off by the hyperplane x — constant must have 
area 1 — a and hence must subtend the same solid angle at the origin. Consequently the 
boundaries lie on a right hypercircular cone through the point whose co-ordinates are aU 
equal to po and whose axis is perpendicular to x =0, namely the line Xi = x^ = . . . = x^. 
For each a there will be a different cone. If > /Xo the cones will be in the posi- 
tive quadrant and in the contrary case in the negative quadrant. 

Furthermore, these regions are independent of Thus for the class of hypothesis 
fXi > fXo or fXi <C yWfl (but not both together) the common best critical regions and TJ.M.P. 
tests exist. 

Finally we have to evaluate a in terms of the sample values determining the critical 
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cones. We have already seen in Example 10.6 (vol. I, p. 239) that if 2 the 

s 

frequency inside the cone is 


B 



dz 

n 

" (1 + z 2)2 



Thus “ Student’s ” test, which we have previously considered on more or less intuitive 
grounds, is now seen to be the best in the sense of the theory herein developed, for the 
admissible class /ll^ > ^0 or for that jUi < (.Iq. 


Example 26.7 

Consider a sample from the normal population with unspecified mean, the hypothesis 
being that a — Uo. We now find 



dju 


logpo 


n {x — ju) 


d(j) _ n 


so that (26.30) is satisfied. 

The hypersurfaces (f> ~ constant are the hyperplanes x — constant, and any regions 
of size 1 — a on these hyperplanes will provide similar regions w. The condition Pt^^p<i 
will be found to reduce to 

(<^0 - 0^?) < "- - ‘^f) + 2(^0 crf|log + ^log /c| = (or- - orf) h, say. 


If O/ > 00 we have (</>) 

and if Oi < 0o we have < k^ (^). 

Since is independent of x, ki will be a function of a and n only. The best critical 
regions are those given by > si and < .sf, as the case may be, and the appropriate 
values of So corresponding to a may be found from the known distribution of s^. The 
critical regions are hypercylinders, and again there are two sets of best common critical 
regions, according as 0f > Oo or Of < Co- 


Composite Hypotheses : Several Degrees of Freedom 

26.27. As a preliminary to extending the theory for one degree of freedom to the 
case of several degrees, we note that if a region w is similar to W with regard to 0i ... 0,. 
jointly, then it is so for each of them separately ; and conversely. The direct result is 
obvious and the converse follows in this way : (we need prove it only for r = 2 because 
the rest follows step by step). If then 

I pdx = \ — a. 

J W 

is true for 63 ... 6^ independently of 0^, and for 0^, 63 . . . 0^ independently of 63, 
then it is true for any values of 0i and 02 and any other fixed values of 03 ... 0^ ; and 
hence it is true independently of 0i and 63 together. 
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26.28. An additional preliminary requirement is the concept of independence of 
a family of surfaces of a parameter. Suppose 

fj{xi...Xn,0)=Gj j = 1, 2 ... k <n . . . (26.46) 

represents a family of surfaces, where 6 and the C’s are variable parameters. Let 
S {d, . . . Cjc) be the intersection of these surfaces, or, if = 1, the surfaces themselves. 

Consider the family obtained by fixing d and allowing the G’s to vary. Then if any surface 
of this family for dx can also be obtained from a second family for 02 we shall say that the 
family is independent of 6 . We get the same aggregate of intersections however 0 is chosen. 
For example, if 

/i == {^1 - Oy -1- (x, - 0)2 + (X, - 0)2 = Gx 
and f^==Xx+a :2 + Xs= G^, 

the family S consists of circles in planes at right angles to the line Xx == = X 3 and having 

their centres on that line. This is true however 0 is chosen, and S is therefore 
independent of 0 . 


26.29. Under certain restrictive conditions similar to those of 26.24 it is now possible 
to find solutions to the problem of determining best critical regions. We assume 

(1) that exists almost everywhere for all k and j = I ... r 


(2) that if 


and (!>■ 


Hi 

90 / 


then <f>] = Ay + ; 


(26.47) 


(3) that the family of surfaces given by the intersections of (^y = Gj is independent of 
0y for j = 1 ... r. 

Subject to these conditions (which are sufficient but not necessary) similar regions exist. 
Consider any two surfaces <f>x and <f> 2 . Since w is similar with respect to Ox alone, we may 
find surfaces = constant and 


f pdw{<f>x)={ pdW{cf>x). 


(26.48) 


In accordance with assumption (3), the family of surfaces ^x = Gx is independent of O^. 
Thus if 02 varies, W (^i) and w (^d will not vary, though perhaps they may correspond to 
other values of Gx- Furthermore, (26.48) is true regardless of 02- Hence within the shell 
<f>x = constant we can repeat the analysis used for one degree of freedom. We find that 
the necessary and sufficient condition for w to be similar to W with regard to both Ox and 02 is 

\ Po dw {<f>x, = (1 - a.) { Po dW {cjixH^), • ■ (26.49) 

J {4>1, J W (<l>x, ^,) 

where W is the intersection of cf>x = Gx, <f >2 = G^ for any values of Gx and G^ ; and similarly 
for w. 

As before, the most general region w is obtained by amalgamating the portions of size 
<1 — a) on the intersections of ^x and <^ 2 . The generalisation to r degrees of freedom is 



COMPOSITE HYPOTHESES: SEVERAL DEGREES OF FREEDOM 289 

immediate. It also follows in the usual way that the best critical region is the one for 
which 

I ^ \ Pt. 

J Wq J 

V being any other region of size 1 — a ; and Wo is defined by 

p^>h{d^ . . . e;)po. . ■ - . . (26.50) 

The following examples will illustrate the theory. 


Example 26.8. Ratio of Two Variances 

Suppose we have two samples of nx, members from independent normal populations 
whose means and variances are unknown. The joint distribution may be expressed as 


/ oc 


1 


exp 



{ (^1 - /^i)^ + «?} 



{(^2 — f-lif + S|} 


Consider the composite hypothesis Ox = = u, say. This has three degrees of freedom, 

for fXx, /<2 and a are unspecified. As the alternative E^ we will take 

Qx ^3 ” /^3 /^1 ^3 ^4 J 

and for Ho itself 

^1 ~ ^3 ^4 “ 1* 

We have first to consider whether the conditions of 26.29 are satisfied. 

(1) Evidently po is differentiable for all parameters any number of times. 

(2) We find — 


9 1 

<fix = logi >0 (-^1 “ /^') + ^^2 (^2 - ff — 6) } 

Ofi (T- 

<f>2 logi^O •= (*2 - /<■ -6) 




da 


log Po 


{7ix + % 3 ) _! (J., __ _ tyi +nxSl+ n, si} 

a a^ 


and (2H.47) is seen to be satisfied. 

(3) The hypersurfaces fx ~ Cx are evidently equivalent to 

nxXx + n^Xz = C[, 

where is an arbitrary parameter. The hypersurfaces give similarly 

X 2 — 2 “ 

Both these are independent of O2 and their intersections, namely ;ri = constant, = con- 
stant, are independent of 0*. Thus the third condition is fulfilled and we may apply the 
foregoing theory. 

The equations ^x = constant, </>2 = constant, ^3 = constant are equivalent to 

Xx — constant 
x^ = constant 

Uxsf + n.isl — constant = + n.^) 6‘‘^, say. 


A.S. — VOL. II. 


u 
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The element Wq is part of W ^2, ^3) within which 

Pt > Po/^ (^1, -sj 

and this condition, by reference to the frequency function, becomes 




< 


1 1 ~ 

^2 {Xi— {A, — bY + n^sl} 

exp 2^2 {^1 (^1 /^i)^ “i~ ~1~ ^2^4 “ (^2 Px H~ fAzY H~ ^2 ^4 ^ '^1 } • 


Since the region w is independent of fi, b and a, we may put them respectively equal to 
jAx, bi and and hence find for the condition 

?^a (1 — e|) {(^2 — fAx — 61) + 4} < 24 Bl (log h —n^ log 04)- 

Since this inequality holds good on = constant it contains only one variable 5| and we 
accordingly find two cases : — 

If 04 = > 1 the best region is defined by s| > h[ (:«i, ^25 4 ); 

O'! 


H 0. 


< 1 the best region is defined by si < (^i, x^, si). 


We have now to determine h'^ so as to satisfy 


j 

J Wq <^3, 


Po dx = (1 


of 

J w, < 


Po dx. 


Now W (f)2, <^3) is the locus for which x^, x^ and 4 are constant, and thus the integral 
on the right is the product of 1 — a and the frequency function po {xi, x^, si). Similarly 
that on the left is the integral of this function over the region for which s\^h' . Thus 


r 

I podx = \ Po {Xx, x^, si, 52) dsl in the first 


case, 


with a similar expression but different limits in the second. Now we have for the joint 
frequency function of x-i_, X2, sf and s| 


/ OC 5W3-3 0xp 


{'^1 (^1 fAi)^ -|- {X 2 pz)“ + (Px + ^ 2 ) 4 } • 


Transforming from to 4 as variable, we find for the condition, after a little reduction- 


rh" 

{ 

J h' 


1 +^^2) 4 — ■^2 4 } cZs|= (1 


rh" ni-Z 

0 1 { (’^I + ^ 2 ) 4 — ^^2 4} 2 ds% 

J 0 


where %" 


Tlx 4" ^2 2 


4- On substituting 71.^ 5| = {ti-^ + n^) 4 u we find- 


r»o' 

{I —u) 

J 0 


i%i 3 ” 3 

"”2““ u 2 du 


fl ni-:i n,-:i T ^ 1 

(1 -u)-^u~ du = (1 - a)B(^ i, 

J Uo \ 2 2 


It follows that u^, u'o depend only on a, and n^. Thus, whatever the values of Xi, 
and si, the best critical region is defined by 


4 <h-= 


if (Ta > Ui 


if (Ta < (Ti. 
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These are equivalent to 


5f + s| 


> Uq 


If we put 




if Ua > <71 
if Ui > ffa- 


Z = ^ log 


%i (‘>^2 — 1) si 


Uz (%i “ 1) si 

the jB-distribution of u reduces to Fisher’s form. The result we have reached is therefore 
equivalent to showing that the z-test is the best for the ratio of two variances in normal 
samples. As usual, there is no U.M.P. test for the whole range of the ratio from 0 to oo, 
but two U.M.P. tests for the ranges 0 to 1 and 1 to oo respectively. 

Example 26.9. Difference of Two Means 

Consider again the previous example, where now the variances are unspecified but 
equal and the means px ^iid — [Xx + h may have any values. The hypothesis is that 
6=0 and has two degrees of freedom corresponding to p and a. 

Let the alternative specify the parameters 

Bx — Pfi 02 — a I, 63 = bf. 

In addition to the quantities required in the previous Example we now use also Xq and 
sJI, the mean and variance of the pooled samples. 

We find that the three conditions of 26.29 are satisfied, and 

4>1 — (‘i^O /^l) 


rr 


Equivalent to this family arc the surfaces 

•i-^o 6\ 


,.2 n 

-S,, — 


The condition P(> h fflPo reduces to 

bf {Xx -- .r-i) < ^1'' {^ 0 , Vi), 

and as usual we find two cases according as > Px er vice-versa. We consider only the 
first, the second being analogous. 

Writing v = Xx — '> we have to determine h' by 


rhi" rh 

Po{x , v)dv -= {I — a) p^ {x^, s®, v) 
J h'" -J h"' 


where In'" and A"’ are the lower and upper limits of the variation of v for fixed values of 
and 

The frequency function of x^, v and 5 ‘f is easily found to be 
/ oc {nx + nf) si —nxsl— exp - { {Xo - Pi)^ + sl}\ 


{ («o “ + '5o) 


" I nx 

whence that of Xq, Sn and v is found to be 




nx 9^2 

{nx + nff‘ 


4 -/ 62 — 


Tlx "*1 ^*^2 ( 
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Since and si are constant over the domains under consideration we have to satisfy 

T — dv 9. — r/\ \ I .<?2 — 1 ‘-i dr 


n ‘> ‘^1 ^2 


2(1 


^ 2 ) ' 


where 
If we put 

this reduces to 


(^1 5 o • ji ') (^1 ~ 1 ~ ^ 2 ) ^0 


V 


B 


G Til + W.2 — 2' 


. 2 ’ 


(Wi + Tia) So 2 

dz 


(1 -h S2) 2 


ir 


= 1 — a 


and 


z = 


aji — a;. 


Til 


VK sf + sfj V 


We have thus arrived at the i-test for the difference of two means in normal variation wlien 
variances are equal. Once again the test we introduced on more or less intuitive grounds 
has been shown to be justified in the light of the theory developed in this chapter’. 


Linear Hypotheses in Normal Variation 

26.30. Several of the hypotheses dealt with in foregoing examples are particular 
cases of a general class known as linear hypotheses, which accounts for the fact that we 
keep arriving at the same sort of conclusions respecting them. 

Suppose we have n independent variates typified by distributed in the normal form 

“p { - ^ 

with common variance cr^ but different means. Suppose the means are connected with 
r and s unknown parameters 61 . . . 0 ^ . . . 0 ^+^ by linear equations of the type 

(26..51) 

j 

Suppose further that the hypothesis specifies r parameters 

01 = Ri, . . . 0^ = B^, 

and hence is composite with s degrees of freedom. Then will be called a “ linear 
hypothesis ”. The reader can verify for himself that “ Student’s ” hypothesis, and the 
hypothesis as to the difference of two means when variances are equal, are of this type. 
The homogeneity test in variance-analysis and the test of regression coefficients are also 
reducible to the same form. If, of course. Ho specifies r linear relations among the 0’s 
instead of the 0 ’s themselves, it can be reduced to a hypothesis which specifies the 0 ’s 
directly, except perhaps in degenerate cases which need not detain us. 

26.31. The theory developed in the earlier part of the chapter for composite 
hypotheses may be applied to linear hypotheses as we have defined them, and the argument 
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follows exactly that of Examples 26.8 and 26.9. It is readily verified that the three con- 
ditions of 26.29 are satisfied. We have — 


k ' < 7 ^ 

<j)j = constant 

4*a = I — {^k ~ 

(7 k 

2n Z , 

<Pa — 2 


. (26.52) 


. (26.53) 


We can therefore find similar regions w (^i . . . <f>^, <f>^) and select from them the best 
critical regions in the usual manner. We will omit the rather cumbrous algebra and quote 
the following result (Kolodzieczyk, 1935). 

Transform to new variates 2/r+s+i • • • 2/n ^7 equation 

r4j8 n 

^jk ^jk Vj’ . • • • (26.54:) 

j — 1 j:=r+S + l 

whei*e the c’s are those given in (26.51) for Jc <,r s and the other c’s are orthogonal, i.e. 

k » •' 

2Jc„Cji = 0. k^j. i>r + sl .(26.65) 

= L k =j, j> r -\- 




Define 


^ y] 


and 




:;==^r+.s-M 

n / r4-« \ 2 

\j-l / 


A hirther transformation of is now made to variables 

that (26.57) becomes 


r-\-s 


nSl 




Wk 


j,k^l 


fc=r+l 


r+s 




wt 


(26.56) 

(26.57) 

Vr+s SO 

(26.58) 

(26.59) 


A=r+1 


TTie coefficients M can, of course, be obtained from the c’s by ordinary determmantal 

algeb^i^itir^g ^ ^ ^ difference between d,j on the alternative hypothesis 

and its value if true, we find that the best critical region is given by 


7 

^ Bjk 


j,k=X 


V(»s; + «S|) ,/ 


> Vo, . 


V 


(26.60) 
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where v is distributed in the form 

n—sS 

dF X (1 — ■ 2 dv — 1 < V < 1 


and Vo is given by 





(2().(}2) 


26.32. There is one interesting conclusion to be drawn from (26.60). If a U.M.P. 
test exists, v should be independent of dj and hence of Sj. This appears to be possible 
only if the denominator in the second part of (26.60) is rational. But this denominator 
is seen from (26.59) to have the coefficients of a positive definite form and hence is only 
rational if r == 1. We conclude that if r > 2 no U.M.P. test is possible for linear hypotheses 
in normal variation. 

We have already seen that under general conditions no U.M.P. test exists for r 1. 
A similar conclusion follows from (26.60) if v = 1, for it then becomes 


-^11 ^iFx 

Vi^ii) 1 £i 


> V 


05 


which, as usual, leads to two cases according as Si ^ 0. 


(26.63) 


26.33. We will pause at this point to review our results. We began by defining two 
kinds of error and showing that a test could be defined as “ best ” for a single altornativo 
hypothesis if it controlled the first kind and reduced the second to a minimum. Wlien 
there is a class of admissible alternatives we may sometimes arrive at a U.M.P. test wliicjh 
wiU minimise errors of the second kind for any member of the class, and such a test may 
be regarded as the best attainable. Though the U.M.P. test does not exist in the "reat 
majority of cases, we may find tests which are U.M.P. for either 0^ > Oo or di -< 0„. Much 
tests have been reached for “ Student’s ” hypothesis and several others in common us(;, 

and are found to give the same tests as those introduced on rather intuitive grounds in 
Chapter 21. » 


^ 1 of a U.M.P. test implies that in the majority of cases we have 

to look for other criteria to provide “ best ” tests. In the remainder of this chapter and 
m the n^t we shall consider several lines of approach which have been developed • 

• ViMn . . on the likelihood ratio. These will 
give U-M.P. tests if such exist, and in the contrary case will do their best, so to speak by 

^ /S w common denominator among the best critical regions 

(6) We may consider the properties of tests when the sample number n tends to infinity 

preeumaSigood t “<1 

will hi ^ “ statistical testa, which 

Will be explamed m the next chapter. wmoa 

fo®f ^liioh is U.M.P. everywhere, we mav seek for 
one wkdi IS U the neighbourhood of the true value. The idea behind this^apnroach 

IS that It wdl he more important to detect errors in the neighbourhood of the true valued 
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and that large errors may he left to look after themselves, either because they are infrequent 
or because almost any “ reasonable ” test will reveal them.* 

(e) When a number of independent parameters are involved, we may abandon the 
attempt to test for each separately and confine our attention to the class of hypotheses for 
which they are functionally related, e.g. by ^ = /(0i . . . 0^). This reduces our problem 
to the case of a single parameter ip, and we may be able to show that a particular ip^ is the 
best in the sense that it is U.M.P. with respect to all other ip’&, that is, to all other tests 
depending on the single function of the unknown parameters. 

We proceed to consider these approaches. 


Tests Based on Likelihood 


26.35. Suppose that for a given member of a composite hypothesis the joint 
samphng distribution of the variables ... has a frequency function jpo (which is, 
of course, the likelihood). Considering the a;’s as fixed, we may examine the variation of 
fa according to variation in the unspecified parameters 0i . . . 0^ which form a set, say 
0 ). Let fa (ft> max.) be the maximum value of fa for such variation. Similarly, if D is 
the class of admissible alternatives H.^, let f^ {Q max.) be the maximum of the likehhood 
for variations of all the parameters Oj . . . 0,.+g. Write 

. __fo{<i) max.) 

“:Pi(i3m'ax.) 

Then a possible criterion for accepting Ha is to take as critical regions those points for which 

X < constant = G, say, . . . . . (26.66) 


(26.64) 


where C is determined by relation to a probability level a from the sampling distribution 
of X, which of course is independent of the unknown parameters. In defining X we have 
assumed that the maxima on the right of (26.64) exist, but we can give the equation greater 
generahty by taking fa (o) max.) as the uppei* bound of values of fa in the set co where no 
maximum exists ; and so for Q. 

In this form the criterion states that we are to accept Hq if the maximum likelihood 
in the set of permissible Ho’s is greater than a specified proportion of that in the set of 
alternatives Hi. In doing so we control the first kind of error in the ordinary way. So 
far as concerns the second kind of error we saw in 26.18 that for Ha simple the criterion 
provided a sort of highest common factor among available tests ; and presumably qualities 
of this kmd will be equally useful when Ha is composite. 


The Problem of h Samfles 

26.36. We will illustrate the theory of the likelihood tests by discussing a problem 
of considerable practical importance. Suppose we have a sample from each of k normal 
populations, being the yth member of the tth sample. Let 

% be the number in the ith. sample ; 

N — B {n.i) be the total number of observations ; 
be the mean of the ith sample ; 

sj be the variance of the ith sample. 

* An alternative line would be to concentrate on errors of the second kind for larger deviations, 
on the ground that large errors are more important than small ones. I understand from Dr. B. L. Welch 
that he considered this approach shortly before the war ; the results did not differ very materially from 
those given by requiring optimum properties near the true value in the case he examined, and the 
results were not published. 
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We will consider three different hypotheses : — 

{!) H, that all populations are the same and hence have the same unspecified mean and 
unspecified variance. 

(2) Hi, that they have the same variance but different unspecified means fii . . . 

(3) Ha, when it is known that they have the same variance, that they have the same means. 


We have for the joint likelihood 

1 1 


V 


{ 2 nyl n 


k exp j 

1 ^ 




— f^iY + 4 
2(rf 


Consider first of all H. We find, for p (-0 max.). 


(26.66) 

(26.67) 

and for p (co max.), putting all the ^’s and c’s equal and equating the first partials of log «« 
to zero, 


Mi = ^ 


0 = ^ ^ n.i Xi . . . 

i==l 

1 ^ 

= ^0 ~ ^ % { i^i • 


Inserting these values in p we find, after a little reduction, 

k / fjZ\ni 

’• - (I)’ 


Sunilarly it may be shown that 


where 

and also that 




= 


\E 


(26.68) 

(26.69) 

(26.70) 

(26.71) 

(26.72) 

(26.73) 


It will be noticed that Ajj^. 

26.37. The function may be related to the correlation ratio rj^. We have 

^0 -^o)^ .... (26.74) 


and hence 


i = l 


= i I 


N 


N 




= (1 - ^2)2 ^ 

from an uncorrelated population. 


‘‘I"®. known form for'„^ n ’ 


7]^ in samples 



THE PROBLEM OF h SAMPLES 


297 


We also find 

= 

aH)» =\{n («?)”< F (28-77) 

So 

o 

The distribution of is that of 1 - where the distribution of is 

A— 3 N-k-2 

dF cc (ri^) 2 (1 - 7f) 2 dr}^ (26.78) 

It can accordingly be tested in this distribution or the related s-form. This is, in fact, 
the criterion used in the analysis of variance for homogeneity tests, and it is interesting to 
remark that the 2 -test here arises in considering the hypothesis that the various distributions 
parent to the sample values, being already known to have the same variance, have the 
same mean. The other form of hypothesis, H, is that the samples come from the same 
population, and the equality of variance is not part of the data but part of the hypothesis. 
We are not then surprised, or should not be so, to find that the criterion leads to a 
different test. 


26.38. The moments of the distribution of Xh may be obtained as follows. The 
joint distribution of x^ and s.^ is 


ni-'^ 

dF cc n {Si) exp 


— -- S7 “4” L 

\2a^ 2(7'^ 


ndxiHdsl . (26.79) 


The distribution of means is independent of that of variances and can be ignored. 
Further, if 

1 


“ Fin-, {pCi ^o)" 


G 


then also independent of the variances, and we have 


dF oc n ^ exp ( - I" exp ( - h 


Put now 


fi 


and note that 


<^“X‘ 


20* 


N si ’ 

N4 -En^sl 
Ns'i (1 — -£■ %)■ 


.2 


Transforming to variables yj and So, we find 

dF cc n xprr- (1 - 2 ' ndy,^ exp 


(-i) 


dsQ, 


whence, for the distribution of the ip’s, 

/If— 




Now 


dF ozniprr' (1 - i:%) ndy>i. 


(26.80) 

(26.81) 

(26.82) 


(26.83) 

(26.84) 


and hence we may find the moments of by integrating its powers over the distribution 
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(26.83). Integrals of this kind, known as Dirichlet’s, are expressible in terms of 
functions and we find, for the ^th moment of about zero, 


r ( — - ^ pi (p 4- 1)% — 1 


{^h) 




k 

n 




pnj 

%2 r 


V 2 


When all the ‘W.’s are equal this reduces to 


pN 



' (j3 H- 1) % - 

-11 


2 -J 


► 


(p + i)N -ly 


26,39. For the criterion we start from the distribution 

dF ccn exp I - ^ r 4) I i7 ds1 


and on putting 


~ Nsl 


^k4 = Nsl(^l - 


i = 1, 2 . . . Jc - 1 

k-l 


we find, in much the same way as before, 

k—1 ri/— 3 


Further, 


whence we find 


k 1 Til 3 1 \ Til ; — 3 

dP(C, . . . 4_i) cc n f,— (i 


f^p {^hJ 




k 

n 


r I (y + i) %- 1 


pni 


n. 


n, 2 P ( 


26.40. For large % we find, in virtue of the Stirling approximation to the 
function, 


(1) for Xh 

fip 

1 

(P + 1)*’-! 

(2) for Ah, 

/Ap 

1 

k-l 

ip + 1) 

(3) for Ah. 

fAp 

1 

{p + 1)“ 


gamma 

(26.85) 

(26.86) 

(26.87) 

(26.88) 

(26.89) 

(26.90) 

(26.91) 

gamma 
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These limiting forms are the moments of the distributions — 


( — log x)^~ 


(2) and (3) 


fc-3 

( — log x) 2 



Hence, by the transformation x — e we see that approximately Ajy is distributed as 
with V = 2k— 2, and and as with v = k — l. 


26.41. For small samples Neyman and Pearson have suggested approximating to 

2 2 

the distributions of and by identifying their lower moments with those of the 

form 

dF oc (1 — a:)™=-h 

This possibility has been examined in detail by Nayer (1936) for the hypothesis i?i when 
all the n’s, are equal. The distribution of Xj^ has also been studied by Wilks and Thompson 
(1937a). 


26.42. Modified forms of the above tests have been considered by various authors. 
We may write 

(26.92) 


log — - 2 ' A log 


where, of course. 




In short, is a weighted mean of the -sf and (A^ is a weighted geometric mean. Bartlett 
(1937c) has proposed using the degrees of freedom (— 1) instead of in these 
equations, that is to say, defines a criterion 



a? 


F Si 


. (26.93) 


This test is, in the sense defined in the next chapter, unbiassed, whereas that based on 

Xfi is not. Bartlett also suggested as an approximation that could be regarded 

as distributed as x'^ with k — l degrees of freedom, c being given by 


c 




. (26.94) 


This has recently been reconsidered by Hartley (1940), who showed that it is not very exact 
for large k and gave a better approximation which can be reduced to tabular form. Cf. 
Exercise 27.2. 
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Likelihood Criteria for the Linear Hypothesis 

26.43. We now proceed to consider the application of the likehhood criterion to the 
class of linear hypothesis as defined in 26.30. We have, for the likelihood function, 


Writing = E {x^ — we have, for the stationary values of with respect to a and 
the parameters 6 (related to the p's by (26.51) ), 

— logpo 1 r = 0 (26.96) 

da a 


^ logpo = ^ {Xk — f^ic) Cjk = 0. ... (26.97) 

i ft=i 

This last equation is clearly the one we should get if we were seeking to minimise 8^ itself 
for variations in the 0’s. Let be this minimum value. We shall then have, from 
(26.96), 

a^ =8l (26.98) 

The maximum of p in the class Q of admissible hypotheses is then 

Similarly the maximum of p in the class co for which 0i ... 0,. are fixed and the other 
s 0’s vary, is found to be 


) 


fi _n 

e 2 , 


. (26.99) 


p (co max.) = 


(■✓(« + ai) v(2^) 



. (26.100) 


where n (/S^ 4- jS^I) is the minimum oi 8^ under the conditions that 0i . . . 0,. are fixed. 
Thus we find for the likelihood ratio 1 

. (26.101) 

or, if more convenient, we may use the function 


to provide a criterion. 

Now we make the transformation (26.54) and show that the values 8^ and 8^ as we 
have defined them here have, in fact, the values given by (26.56) and (26.59). We have, 
from (26.54), 



n ( r-hs 


8^ 




^jk Vj 


k^l L 
n 


0 =r+s+l 


n n 


n 


k==l 




k^l 
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Since is the minimum of for all variations of the 0’s and E and y are independent 
of the 0’s, we must have 

nSi=Ey1. 

Also, since nSl is the minimum of 8^ when the values di ... are fixed, it is seen to have 
the value given in (26.59). 

We have also 

;Sf2 = (26.102) 

where ^ \ ^ ) ’ 

fc=i \ j=i / 

and the frequency function of E’s and y’s is given by 


f{E^ 


^r+S') 2/r+s+l • * ■ Vn) ^ 


n 


{SI + /Sfg) . . (26.103) 


Now nSl is the sum of squares of — r — • s normal variates, and hence 

/ (S«) X Sr-— exp ^ 

Hence, since the E’s are independent of the y's, and since S'^ depends only on the y’s, 

f (S„, <x Sr— exp I - A (SI + S§) . (26 J05) 

We have seen, in effect, that is the minimum value of 8^. It depends on Ex ... E^ 
and hence is independent of 8'^ and is distributed as 

— 


(26.104) 


/ (^b) oc ®xp 

Thus we have 

/ (N„, Sf,) oc xSl ' exp |- 

Putting now Z = we find 

f{Z) oc Z'-'-"-' (I 4- 

which may be reduced to Fisher’s form by putting 


2(tV 


n 


2 ( 7 '“ 


r, {K + 81) . 


•i log 


81 {71 


rSl 


log Z + log 


s 


(26.106) 

(26.107) 

(26.108) 


We have thus reduced the test of the linear hypothesis to the z-test and it is seen that 
several of the tests introduced in Chapter 21 can be justified on the likelihood criterion. 
These include the “ Student ” test for one mean, the extended form for the difference of 
two means, and the test for the ratio of variances. Certain other tests in which the 
z-distribution (which, of course, reduces to the i-distribution for Vx = 1) appears — such as 
that of the correlation ratio, the multiple correlation coefficient and regression coefficients 
—also depend on the linear hypotheses, and in the light of the theory here presented are 
seen to be different aspects of the same thing, at least so far as the testing of hypotheses 
is concerned. 


26.44. We will indicate briefly, without going into the complicated mathematics 
involved, some interesting results obtained by P. C. Tang (1938) and P. L. Hsu (19416) con- 
cerning the power of the z-test as applied to linear hypotheses. 
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The functions 8^ and 8l, as we have seen, are distributed independently in the 
;j;2-form, and their ratio accordingly in Fisher’s form. From this viewpoint the test of 
the linear hypothesis is a generahsation of the test of homogeneity in the analysis of 
variance. Tang considers the distribution of 

^ SI -I 81 = ^ • • • ■ ( 26 . 109 ) 

and the variation for errors of the second kind, namely, when the values are 

different from the specified values. He shows that the power of the test depends, not on 
individual alternative values, but on a single function of the 0’s. He also obtains the 
power function and tabulates it. 

Hsu then considers other possible tests which are based on this single function and 
shows that in this class of test the 2 -test or the equivalent F test is the uniformly most 
powerful. 


26.45. For large samples, when maximum likehhood estimators of the parameters 
exist, the distribution of — 2 log 2. is that of with s degrees of freedom. For the 
distribution may then be written (see 17.46) — 


dF = A exp 


so that 


p {Q max.) = A. . 


dQf+s 


. (26.110) 


If 01 ... 0„ are fixed the likelihood becomes 


where 


= A exp - r g]j, z] Zj, - 

T 

2o = ^ 9jk - ^j) 0k — Ok) 


(26.111) 




and Zj is given by §j — 9^ — where is a linear function of the r specified parameters. 
Thus — 

p [co max.) = Ao ..... (26.112) 

where Aq is the value of A when 6j takes its true value 0^-0. Thus, when Ho is true, 

(26.113) 

But the characteristic function oi xl ( = — 2 log X) is 

j S,... 

= ^ I exp |— I r 2 } 4 + xl (it - i) I . . . de,+,, 

CC — ^ (26.114) 

(1 — 2it)i 

This is the characteristic function of a quantity distributed as ^5 degrees of free- 

dom, and hence the result follows. 


26.46. In concluding this chapter we may mention briefly a question which fre- 
quently presents itself when statistical hypotheses are being tested in practice. Our tests 
are based on the observed values obtained in the sampling process, and in order to apply 
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them we require no prior knowledge of the parameters to which they relate. They can 
be used in a state of complete ignorance about the parameters. But suppose some informa- 
tion is already available ; or suppose that we attach varying degrees of importance to the 
avoidance of particular types of error. How far are the tests developed in this chapter to 
be modified ? 


26.47. Consider, for example, the situation which has already been mentioned in 
connection with the theory of estimation, of the chemist who is assaying the strength of 
a particular drug. If the drug has harmful effects in large quantities it may be much more 
important for him to detect cases in which the true strength exceeds his hypothetical value 
than when the true strength is deficient. Again, the manufacturer of a “ guaranteed ” 
product is usually much more concerned with ensuring that it does not fall below the 
guaranteed standard than that it exceeds such standard. In such circumstances we may 
be particularly interested in ‘‘ one-sided ” tests of the type $ < Ioj a-nd as we have seen, 
there more often occur U.M.P. tests for this class of alternative than in the case when | 
can have any value. We might, therefore, be quite ready to accept such a test, knowing 
quite weU that it may be insensitive in part of the range of the unknown parameter, merely 
because errors in that range are relatively unimportant. 

Similarly we might be willing to accept a test which had a poor discriminatory power 
in part of the range but compensating advantages elsewhere, simply because we know 
beforehand that values of the parameter rarely or never fall into that particular part of 
the range. This is equivalent to prior knowledge of the distribution of the values 
determining the alternative hypotheses. 

26.48. It is difficult to reduce rather vague prior knowledge of a parameter to numeri- 
cal form, and hence to extend our theory with great precision to cover these cases ; but in 
practice it is desirable to consider, before adopting a test, whether any prior knowledge is 
available, or whether our interests centre on particular parts of the range. If they do, we 
may consider the behaviour of power functions of the possible tests at our disposal and 
examine which is tlie more powerful test in the particular part of the range which interests 
us most. The mere fact that the tlieory developed in this and the succeeding chapter 
makes no assumptions about the prior probabilities of admissible alternatives does not 
mean that we should be acting sensibly in ignoring any prior information which may be 
at hand when applying the theory, or that we need feel compelled to apply tests with 
optimum properties in regions where we know the unknown parameter -values will not fall. 


NOTES AND REFERENCES 

The theory of this chapter is very largely due to Neyman and E. S. Pearson, whose 
treatment has been closely followed. In their first contribution to the subject (1928) the 
likelihood criterion was developed, the theory of first and second kind of errors and power 
of tests being given in 1933. For the theory of unbiassed tests, see the papers of 1936 and 
1938. In the last few years the literature has grown considerably. 

Feller (1938) has shown that similar regions only exist in rather exceptional circum- 
stances and that the theory of composite hypotheses is incomplete. Tables of certain 
power functions and distributions associated with likelihood tests are given by Mahalanobis 
(1933), Neyman and Tokarska (1936&), Wilks and Thompson (1937a), P. C. Tang (1938), 
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David (1939), Nayer (1936), and in Tables for Statisticians, Part II (Tables 35-37). See 
also Mahalanobis (1933). 

For tests based on the likelihood ratio, seeNeyman and Pearson (1928, 1931a, 19316), 
Pearson and Wilks (19336), Wilks (1935a), Nayer (1936), Welch (i936a), R. W. Jackson 
(1936), Sukhatme (19366), Bartlett (1937c), Wilks and Thompson (1937a), Wilks (1938a), 
Bishop (1939), G. W. Brown (1939), Mood (1939), Hartley (1940), Wald and Brookner 
(19416). 

For the general theory, see also Welch (1935), Kolodzieczyk (1935), Neyman (19356, 
19376, 19386), Daly (1940), Pitman (19396), Wald (1939a, 1941a), Wolfowitz (1942), E. S. 
Pearson (1941, 1942a), Dantzig (1940), P. L. Hsu (19416), Simaika (1941), MacStewart 
(1941), Scheffe (1942a, 1943). 


EXERCISES 

26.1. Examine the following argument : To accept H when it is false is equivalent 
to rejecting not-H when not-H is true. Hence, if X = not-H, to commit an error of the 
second kind for H is to commit an error of the first kind for K ; and thus there is 
no distinction between the first and second kinds of error. 


26.2. For the distribution 

dF = ^ ^x, X > y 

= 0 X <.y 

show that for a hypothesis Ho that ^ = fo, 7 = Vo and an alternative Hi that 
y = yi, the best critical region is the region Wo where ^o ~ 0, together with the region 
defined by 

X < ^ \yi^i - yo^o - I log k + log ^\, 

Pi - Po I n J 

provided that the admissible hypothesis is restricted by the conditions yi <yo, (^i {io- 
Hence show that a U.M.P. test exists in such circumstances. 

(Neyman and Pearson, 1936a. This shows that a U.M.P. test can exist for more than on<,^ naknown 
parameter.) 


dF = 


26.3. If the distribution function of is given by 

n , 

- ^ > dx^ . . . dx,^, 


aX^Tz) 


exp 


^ J=i 


ny 


y, o> 0, 

show that the frequency function may be put in the form 

{x — yY 


00 < it;, 


X, 


CO 


/ cc exp - 


2a 2 


exp 


and hence that x is a “ shared ” estimator sufficient for y and a. Show further that the 
best critical regions for yo, Oo differ according as a® > (^2^ ^^2 or a = Oo, and that 

their boundaries depend on y. Hence no D.M.P. test exists for admissible alternatives 

a > 0. 


(Neyman and Pearson, 1936a.) 
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26.4. In the previous exercise put a — y and consider the class of hypothesis y > 0. 
Show that there are different best critical regions according as y > yo? y < yo that 

their boundaries depend on y. Hence there is no XJ.M.P. test, but x is sufficient for y. 

(Neyman and Pearson, 1936a.) 


26.5. In samples from a normal population, show that the probability of accepting 
the hypothesis that the mean when, in fact, it is false and fj. = ju^> /Zo — that is, 

the probability of an error of the second kind — ^is 


wV' 1 

1/2^ 


pOO 

J 0 


where 


exp - 

jLli — fXo 
G 


nv 


2 \ 1 p- 

r; vr2^J- 


du dv 


X — /i 

s 

of errors of the first kind. 


and t is the value of — — corresponding to the significance level 1 — a for the control 

[. 

(Neyman and Tokarska, 19366.) 


26.6. In six samples of six members each the following values were obtained — 


Sample. 

Mean. 

«f,- 

I 

8433 

24,722 

2 

8200 

94,133 

3 

7933 

149,733 

4 

8120 

45,037 

f) 

7971 

88,480 


8263 

49,921 


with 6*f, = 104, .588, — 75,338. 

'i .1 

Show that - 0-8508 and = 0-6219. The 5-per-cent, levels are respectively 

0-67 and 0-54, so that there is no evidence of heterogeneity. 

(Pearson, appendix to papers by Wilsdon, 1934). 


26.7. Verify tliat the likelihood ratio leads to “Student’s” test for an unknown 
mean in normal samples, to the use of Fisher’s 2 : in testing the equality of two variances, 
and to the if-test for tlie difference of two means in normal populations with the same 
variance. 


26.8. If samples . . . % are drawn from the populations 


dF = i exp ^ ^dx, i = 1 ... k 

use the likelihood ratio to test the hypothesis Hq that the populations are identical, 
showing that 




mo — 'i=i 


n {Xi — xiiT'- 


nifi, say. 


{Xo - iC.i) 


A' 




A.S. — VOL. 11. 


X 
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where is the mean of the ith sample, is the smallest member of that sample, is the 
mean of aU samples together and a;^i is the smallest value in all samples together . 

Show that the distribution of and is 


and hence the moments of Lq are 




]^vr{N-\) * 

F(A+p -1) iii 


r(,-.+ = ) 

n^N r {Ui — 1 ) 


If Hi is the hypothesis that the populations have the same a but any possible different 
jS’s, show that 




''■Hi 


nit i 


where I is the weighted mean of the Z’s, and that 


/r ^ _ N^r{N -k) ^ J 
( i) r{N - h ^ 


rf % — 1 + 


pn. 

A 


0 


pnj 

n,^r(n,-l) ) 


If is the hypothesis that the populations, being known to have identical cr’s, have 
the same show that the distribution of 


^hJ^ 


I 

lo 




(Sukhatme, 19366). 


26.9. In the notation of 26.36 show that, if H is true, the criteria and aro 
distributed independently. 


(Neyman and Pearson, 19316). 



CHAPTER 27 


GENERAL THEORY OF SIGNIFICANCE-TESTS— (2) 


Bias in Statistical Tests 


27.1. In considering the problem of estimation by confidence intervals in Chapter 19 
W(‘ liad occasion to I'emark on the rather arbitrary nature of determining the interval so 
t.liati both inecpialities 0i 0 and 6 <-0^ had an equal chance ^-a of fulfilment. A point 
o.t a similar nature arises in the testing of hypotheses, particularly when an asymmetrical 
sampling distribution for the criterion is concerned. Consider, for instance, the testing 
of tlic hyj)othesis that in a normal sample of n members the standard deviation a has an 
assigned value <t„ irrespective of the mean fx. As we have seen in Example 26.3, there is 
no U.M.P. test for all u > 0, though there is one for cr > o-o and another for a < o-q. In 
choosing a test to cover the whole range cr >> 0 we have, therefore, a certain freedom of 
c;hoi(^e, since! there exists no “ best ” test as we have previously defined the term. A 
t^ommon tc^st in practical use is to take the sample variance s^ and accept the hypothesis 
<T - ! cr,, if and only if 

sj <s|, (27.1) 

where .sf and are determined from the distribution of s^, namely 


such that 



. (27.2) 

. (27.3) 


In short, sf and -vH are clioson so as to cut off equal ‘‘ tail ” areas of the distribution. This 
}>rocc<luro will, of <;ourse, control errors of the first kind ; but so equally well would the 
s(‘I(‘ci'ion of sf and so that 



(27.4) 


and 



. (27.5) 


j^rovided that ( a.j “ a. Thus we have an infinite number of regions which will control 
cri-ors of the first kind. It is natural to seek for some criterion which will distinguish one 
as better than the others, recognizing that no U.M.P. test exists. 


27.2. Sucli a c^riterion arises natiirally from the following consideration. In the 
(wample given, with aj a.^ |a, let us calculate the power of the test for different values 

of cr. This can readily be done from the distributions of type (27.2) by means of the incom- 
I)let t' / -function or the equivalent integral. For any given a we have to find 


Jo J s| 
307 


. (27.6) 
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where 



Fig. 27. Ij adapted from Neyman and Pearson (1936), shows the relation between 
the power function ^ and for ai = aa = 0-49, n = the rejection level being 0*02. 



0^ in Sampled Population (in units of Oq). 

Fig. 27.1.— Power Curve in Samples of 3 for o-® from a Normal Population (see text). 


We see that for c; > 1 = cto the power increases, and so also for cr < = I<7o. But 

between and the power is less than 0-02, i.e. less than 1 — a. Hence for such values 
the chance of an error of the second kind, namely, the acceptance of a false hypothesis, 
would be greater than the chance of an error of the first kind, namely, the rejection of 
- a true hypothesis. 


27.3. Whether this is felt to be anomalous depends on the relative importance of 
the two kinds of error in particular cases ; but, other things being equal, it may bo felt 
more important to avoid the second kind than the first, and not to have a greater probability 
of accepting the hypothesis when it is false than of rejecting it when it is true. This, at an y 
rate, is the basis of the criterion which we proceed to discuss, namely, that the critical region 
w should be chosen so that P (F e w;) is a minimum when the hypothesis tested is true. 

Consider then the case when Hq ascribes to a parameter 0 the value 0o> and the admis- 
sible alternatives ascribe other values to 0 but do not differ from H, in oth^ respects. VVe 
shall say that w is an unbiassed critical region if, and only if, 

'Po dx = P {E e w \ e^) = I ~ OL, 

J W 

and for any other 0, say 0', 

I p (0') dx = P {E £w\d') > I — a 

J W 


. (27.9) 
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Equation (27.8) expresses the usual control of errors of the first kind and (27.9) the mini- 
inising property of w. If a region is not unbiassed it ■will be said to he biassed. 

27.4. In certain cases there will exist among the unbiassed regions a such that 

[ 'p{e')dx>\ 'p{e')dx (27.10) 

for all admissible 0' . Such a region may be called the best unbiassed critical region and 
the test based on it the uniformly most powerful unbiassed test, or briefiy the U.M.P.U. 
test. It minimises the risk of errors of the second kind among the class of unbiassed tests. 
As we shall see presently, U.M.P.U. tests do in fact exist in certain cases. 

The use of the word “ unbiassed ” in this connection is rather arbitrary and is not to 
be interpreted as meaning that biassed tests will give systematically wrong results, or that 
unbiassed teats are based on unbiassed estimators. Fortunately the different uses of 
the term “ bias ” usually occur in different contexts and confusion is infrequent. 


Unhiasmi Regions of Type A 

'll .b. Following Neyman and Pearson, we now define an unbiassed critical region 
of 'Fype A as one for which 


Pf^dx = \ — a, 


a 

do 


\ 

p dx = 0, 
JW Jo — fio 


and 


■ f 

_a^J, 


p dx 


is a maximum. 


. (27.11) 
. (27.12) 

. (27.13) 


0 = 00 


We shall, as usual, assume that the differential coefficients exist and shall also assume that 
(litterentiation may be carried out under the integral sign, so that we have for all w, 

say, . . • (27.14) 


d 

dO 


r p dx = [ = C p' 

J w J I/’ J 10 


and similarly for the second differential coefficient which we denote by p". 

The first condition (27.11) controls errors of the first kind; the second makes the 
region iv locally unbiassed ; the third, (27.13), implies that as 0 departs from do the power 
function increases more rapidly than for any other unbiassed critical region of the same 
siz(\ Thus in the neighbourhood of 0,, the test may be said to be better than others of the 
uid)ia,ssed type. It may not be better for larger values of 1 6 - 0o |, but the Type A tests 
iw ba,sed on the supposition that it is more important to detect small errors of the second 
kind than to minimise the risk of large errors, which will probably be detected in any case. 

27.6. The regions of Type A may be found by the use of the following theorem . 
the T’egion a;,, is an unbiassed critical region of Type A if, within Wo, 

p" (Oo) > hp' {Oo) + hP (Oo), . . • ■ (27.15) 

and outside Wo, , , /on ■t(i\ 

p" (Oo) <hp' {Oo) k^P i^o), • • ■ • (27-16) 

dp~ 

dO _ g=0„ 


where 


P' (Go) 


etc.. 


and Ay, ho are chosen so as to satisfy (27.12) and (27.13). 
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Suppose that Fq . . . F^ are functions of aji . . . and that 

j* Fjdx — Cj, a constant. .... (27.17) 


Let Wo he a region such that inside it 


and outside it 



Fo <^S1 Ca Fa, 


where the ^’s are constants chosen so as to satisfy (27.17). 
(27.17) is valid 

I* jPo da: < |* Fq dx. 

Jw Jwq 


. (27.18) 
. (27.19) 

Then for any w foi' which 


. (27.20) 


In fact, let wwo be the common part, if any, of w and Wq. As both wj and icq satisfy (27. 17), 
we have 


Now 



I 

J Wq-^WWo 

I 

J Wq^IVWo 


Fj dx. 
Fo dx — 


i 


F 0 dx 


W—WWo 


(27.21) 


> f A" m F^) dx-{ E m F^) dx 

J W,—WW, J W—WWa 

> 0 , 

in virtue of (27.21). 

In our present case take Fo as p" {Bo) and F^, F._ as p' (0„), p {Bo) respectively. Then 
(27.20) is true, and hence (27.13) is satisfied if (27.18) and (27.19) are true ; and these will 
be found to reduce to conditions (27.15) and (27.16). The theorem follows. 


27.7. If (27.14) holds, and if there exists a sufficient estimator t for 0, then the 
Type A region is bounded by surfaces of constant t. For then we have 

p {B) = p^ {t, B) p^ {x) (27.22) 

and hence, from (27.15), on substitution. 

Pi (^5 0o) ^ {p} Bo) -f- k-iPi {t, Bo) 

within Wo, and conversely outside it. The equality must hold on the boundary, which 
is equivalent to the theorem. 


27.8. Writing 


<!> = 
i>' = 


dB 

902 


log p 


J0=0« 


e=e„ 


p' {Bo) = <f>p {Bo) 
p" (0„) = (f 


(27.23) 


we have 


. (27.24) 
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and hence the inequality (27.16) reduces to 

. . . . . (27.25) 

within Wo, wherever p (0o) does not vanish ; and conversely outside Wq. 

We may distinguish three special cases ; — 

(а) If is a function of say F (^), we have — 

F {<!>) ^ cl>^ > kj_ cl> + h^, (27.26) 

and the Type A region is bounded by the surfaces 

= Cj and j = 1 ... m, . . . . (27.27) 

where m is the number of roots of (27.26). In this case, as we saw in 17.30, there exists 
a sufficient estimator. It follows that is defined by inequalities of the type 

and we may, as in 26.24, use the <ji’s as new co-ordinates and calculate the size of a region 
from their distribution functions. 

(б) As a simple case of (a), if 

A + Bcf> . . . . , . . ( 27 . 28 ) 

we find, for (27.26), 

- ko <l> - h == 0, ( 27 . 29 ) 

and the limits of ^ are given by the two roots of this quadratic. 

(c) If cannot be expressed as a function of cf> which does not involve the aj’s explicitly, 
we shall have 

<f>' > k^ + k^ <f> ~ cf>^- ( 27 . 30 ) 

In this case, considering <f> and <f>' as two co-ordinates of a point in a plane, we see that 
the region for which (27.30) is true is the one “ above ” the parabola = k^ ki (f> — 
and that /ci, are determined by 


^00 ^ */j 

I I 'p {cf> , d<^' =1 — a 

J — cc J </>' 

poo 

I (f) (l(fi 1 p (f/^, <!>') d(f)' - 0. 

J J cA' 


. (27.31) 


(27.32) 


In this instance we can reduce the probletn to two dimeusions l)y using two new co-ordinates 

i>, f. 


Example 27.1 

Consider the normal distribution 

1 


dF 


exp {— I' {x — } dx. 


Vi^^) 

To apply the foregoing theory with complete rigour we have to show that (27.14) is true. 
We shall assume that this is so, referring the reader for a formal proof to Neyman and 
Pearson (1936). 

We have, then, with 0 — p, 

log p(p) = - ^ n log ( 27 r) — ^ F {x — //.)“ 
cj) = F {x — Po), <j>' — — 7h, 
and hence this case reduces to that of (27.28). We write 

(j> — n{x — po), 
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and can clearly use x instead of ^ as a co-ordinate, which confirms the result of 27.7 since 
X is sufficient for /j>. 

It follows that the unbiassed region of Type A is given by 

X < ^ 1 , X Xz 

rZi 

where 1 p (x) dx — ex. 

r^2 

and I p (x) {x — fi) dx = 0. 

Now if Ho is true, that is i£ fx = /Uq, x is distributed in the form 


Hence Xi = — x^ and the Type A region is defined as being outside the range 

Po 

where A is given by 


^ ^ ^ I ^ 

X ^ jU/Q ~ 1 “ 




V'^ 


i: 


dx = ^ (1 — a). 


u V(^^) 

In this case the Type A test leads to the usual test based on equal tail areas, '^rhe 
same test follows from the likelihood ratio, as the reader can verify for himself. 


Example 27.2 

If the distribution is normal with zero mean and variance cr^, and Ho is that a - 
we find 


U'Oj 


0 = 4 I - al\ = — {v - n), say. 

(Tq L ^ J 0*0 


This also satisfies (27.28), and the Type A region will be defined by 


Vi <,v =-- Z x^, or v < Vi, 


where 


and 


J Vi 


p (v) dv = y. 


pi's 

1 p (y) {v — n) dv ~ 0. 

J Ti 


Here p (v), the frequency function of the second moment, is 

p (v) = yUn- 2 ) Q-iv 

^ ^ 2 i''‘ r (in) ’ 


and we find, for the second equation, 

r^2 rvz 

1 dv — n \ e 

J ^*1 J Vi 

Integrating the first member by parts, v being one part, we are left with 


dv == 0. 


— 2^^^ e 


Va 

) 

Jvi 


0 




or 
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This has to he solved in conjunction with 

rv2 1 

I 2) dq) — a. 

The numerical solution can be carried out by successive approximation or graphically. 

In this connection Fig. 27.2 is of interest. It shows, for samples of two and a = 0*98, 
the graphs of the power function for the ordinary test with equal tail areas, in addition to 
the power functions for the Type A test, the U.M.P. test with o > Uo and the U.M.P. test 
with a < (To. 

Evidently, for cr > Uo the best critical region (2) has the greatest power {as it must 
have), and for a <i a ^ the best region (1) has the greatest power. The test based on equal 



Fig. 27.2. Power CiirveH of Four DifToront. Tests of the Viiriiiiico in Normal Samples of 2 (see text). 

tail areas has a greater power than the Type A test for cr > Uo but a lower power for o < 0 ^, 
besides being biassed, as we have seen. 

As n becomes larger the same effects persist, but the Type A and the “ equal tails 
tests become closer together in power. For samples of 20 or more there seems to be no 
serious loss in using the latter since the range of bias and its magnitude are then very small. 
If, of course, we knew in practice that a '> do we should use the U.M.P. test, and cases may 
arise, even when such knowledge is lacking, where “ one-sided ” hypotheses of this kind 
are all that concern us. 

Invariance Theorem for Type A Regions 

27.9. It is important to show that the regions selected on the basis of Type A criteria 
conform to corresponding criteria if some other function C (0) is used instead of d itself. 
In Example 27.2, for instance, where we took 0 to be the standard deviation o, should we 
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have obtained the same regions if we had taken 0 to be the variance 1 The answer is 
affirmative under certain general conditions, as we should expect from the relationship 
with sufficient estimators. 

Suppose we have a new parameter C, given by 


0=0o+/(C)=v(a (27.33) 

where /(O) = 0. Then if p {'ip) satisfies (27.14) and the similar equation in second differen- 

I— —I 


tials, if ■yj is monotonicaUy increasing and 


d-ip 


> 0, then the region based on C is an 


unbiassed critical region if that based on d is so. It is sufficient to show that (27.15) 
and (27.16) are satisfied for C- Now 

dip' 


6 = ip (C), ip{Q) =6 


at 


= ip' 


Thus 


~ d‘^ip~ 


ip" (say). (27.34) 


=p,{B\ip{0)) 

= Pe{^ \ Oo) y^', 

and p' iE\ip{0))= p; {E [ 0„) ip'^-^p',{E\ 0„) ip". 

Solving these for p'q and p^ and substituting in (27.15) and (27.16), we find 
{E\ip{Qi)) > Jc,' p^ {B\ip{0)) -f h' p^{E\ip{0)) . 
withm w and the contrary outside, where 

V _ h ip'^ -h ip" 

% ~ “T - 5 

ip 

The result follows. 


^2 = JCz ip'^. 


(27.36) 


(27.36) 


Regions of Type Ax 

27 . 10 . The regions of Type A are determined so that tests based on them are 

U.M.P.U. in the neighbourhood of 6^. We now consider a region, said to be of Type Ai, 
which is U.M.P.U. everywhere, i.e. which obeys (27.11) and (27.12) bxit has, in place of 
(27.13), ^ 

\ pdx>\ pdx (27.37) 

J Jw 

for every admissible d and every w satisfying the other two conditions. 

It is conceivable that (27.37) does not entail the existence of a U.M.P.U. test, for there 

might be an unbiassed region of size 1 — a for which the derivative of ^p dx did not exist 

at d = but which nevertheless gave a more powerful test. This refinement, however 
need not detain us. 

27 . 11 . H PF+ represents the sample-space where the density is not zero, if 

<!>’= A 

and if (0o) does not vamsh identically in then the unbiassed critical region of Type A 
IS necessarily of Type Ai. ' 

Let Wo be the Type A region, which is determined ea: hypothesi by two numbers c, 
and Ca, such that — * 

Cl < ^0 < Ca outside Wo. 



Wo have to show that 
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I p dx'> \ p dx 

J Wo J W 

foi- all athnissible 0 and any w for which 

I pdx = I — 0 !., 

J W 

with the consequence that 

\ p' dx = 0. . 

J W 

Since <// == A + Bcf) we have, solving this equation as a linear differential equation 
of the fii'st degree, 

^ == I j* A exp ^ — + exp j* jB dd. . . (27.40) 


(27.38) 


. (27.39) 


readiu' may verify that this is a solution, and since it contains the arbitrary constant 
T it is the most general solution. It follows that we may write 

log p = P (6) + TQ {d) + f (x), sAy, . . . . (27.41) 

where P and Q do not depend upon x. We then have — primes denoting differentiation with 
r<\speet to 0 and the suffix 0 relating to do — 

<^0 = p; + to; (27.42) 

We. note that cannot be zero, for if it were we should have 


which would imply that was identically zero. 

In virtue of the lemma of 27.6, the proposition will be proved if we can show that 
for iix(‘d 0 and 0„ there are two numbers a and b, depending on 6 and do but not on the 
.r’s, such tliat 

p :> po {({'(!> 0 + b) inside Wo .... (27.43) 

and tlK‘ contrary outside vco- Putting the values of p and (j>o in this expression, we have 
to show that « and h can be found such that, inside Wo, 

exp l {()) + TQ {0) +f{x) } > exp{P (0o) + {do) + f (x)} {aP'o + a,TQ’^ +'b} 
or, writing r - P (0) - P (Oo), q ^ Q (d) - Q (do), such that 

exp (r + qT) > aQf^T + aP o + 0 

> aiT + 01, say (27.44) 

Here q cannot be zero, for if it were Q (0) would be equal to Q (do) and, integrating the 
frc^quency functions over W, we should find r = 0. The alternative hypothesis would 

not then differ essentially from Hq. 

Consider at the outset the case when Ci and Ca are different. From (27.42) we see 
that (f>o depends only on T so far as variation in x is concerned, and that 

if = Cl T = = T, (say) . . ■ (27.46) 

Vo 

if = c, r = = T, (say). . . ■ (27.46) 

Vo 
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and are different. Choose and 6i so as to satisfy 


a, T, +61 = 


. (27.47) 


Then (27.44) is satisfied at the boundary points and we have merely to prove that 


Cl < 00 < C2 implies < «! T -f- 

00 < Cl and 00 > C2 imply J* 4- 

This follows from the fact that 


(27.48) 


y = T~ 61 


has only one. minimu m , between Tx and T^, as may be seen by differentiating it twice, for 
the second derivative is positive and hence the first is a monotonically increasing function - 
But y vanishes at Tx and and hence is negative between those values and positive 
outside them. 

Finally, if Ci and are equal, say to c, we choose ax and hx so as to satisfy 


■Fq QqTq — o'! 

_ ^1 = OV (27.49) 

er+gr. -axT^-bx= oj 

It will be found that y has a minimum at T* = To and vanishes there. It follows that irk 
the region complementary to where 0o == c, we have 

e^+22’ a, T + 61, 

and thus in Wo where 0o < c or c < 0o the left-hand side must be less than the right- 
hand side. The demonstration is complete. 


Example 27.3 

Consider again the data of Example 27.2. We have already seen that for this dis- 
tribution <f}' = A j> B, so that the regions of Type A are also of Type A,. Among; 
unbiassed tests of the hypothesis this is the uniformly most powerful test. 


Composite Hypotheses ; Regions of Type B 

27.12. We now consider the extension of the foregoing results to the case when 
Hq is composite. For simplicity we will suppose that there are two parameters Ox and 0.,, 
Hq specifying 6 x as say 610 and leaving 6 ^ undetermined. Then a region •?/’„ will be said 
to be of Type B if 


{a) p ( 6 x 0 , 62 ) dx = 1 — a for all admissible O 2 ; . . . . . (27.50). 

(b) p (dx, 62 ) dx may be differentiated twice with respect to Ox under the integral 


(c) 


sign; 

. L. ^ 


= 0. . 


01 — 010 


. (27.51) 


(d) For any other 


region w satisfying (27.50), 




. (27.52) 
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These conditions are obvious generalisations of those defining Type A. Putting now 

<^3- = ^ logp y = 1, 2 . . . . (27.53) 

i>jk — ~ 4>kj^ & = 1, 2, . . . . (27.54) 

Jc 

we state that the Type B region will exist and may be found if and ^2 are algebraically 
independent, if 

^11 = A-o + + Az <^ 2 *] 

^12 — -So "b (jix 4" JB^ ^2 r • ' • • (27.55) 

^22 ~ Cq + C-2, 

and if the law of distribution of is uniquely determined by its moments. We omit the 
proof of this theorem, for which see Neyman (19356). 


Simple Hypotheses with Two Parameters : Regions of Type C 

27.13. The extension of the foregoing theory to the case of a simple hypothesis 
specifying several parameters presents some new features. Again to simplify the discussion 
we shall consider two parameters, 0i and 0^. 

Consider the power function in the neighbourhood of = 0o = 0 which we will suppose 
to be the values specified by Hq. Writing for the function 


/5 (01, 02 I ?^) = f p (01, 0.,) dx 
J IP 

= 1 , 2 


LdOj 


01 
f-i 

30, 90;J 




l^jki 


y, k = 1, 2 


we have, assuming an expansion by Taylor’s theorem, 


. (27.56) 
. (27.57) 

. (27.58) 


§ (01, 02 ! W) -= ^ (0, 0 1 w) h 01 ft (w) 4 - 02 f. M 

+ I- {0? fu M -f 201 02 /b2 (W) I- Oi /^22 («’) } + • ■ • • 

To extend the idea of unbiassed tests to su(!h a ca.se we recpiire in the first place 

/b {W) 0 1 

P:,{W) ■■■-- 0 f' 


(27.59) 

(27.60) 


Secondly, there will be a minimum at 0i 0, = 0 if 

A <0 (27.61) 

and /0XX, /022>O (27.62) 

If these conditions are satisfied the power function for small values of 0i and 02 is effectively 
f (01, 02 1 w) “ 1 — a 4- i {0f fii + 201 02 ^12 4- 02 ^22} • • (27.63) 

We may represent this diagrammatically as in Pig. 27.3, which shows one of the ellipses 
for which the power function is consts^nt. 

Since the hypothesis Ho is that 0i = 02 = 0, we may speak of the value 0i as the “ error 
in 01 ”, and similarly for 63 ; and if, as in the case depicted, the co-ordinate axes are not 
the same as the principal axes of the ellipse it is clear that for values of 0i which are not 
zero, errors of positive and negative sign in 02 are not equal. From this viewpoint it may 
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be said that the minimisation of the power function does not control positive or negative 
errors to the same extent ; for the points A and B in Fig. 27.3 lie on the elhpse of constant 



Fig. 27.3. — ^Ellipse of Constant Power for Simple Hypothesis with Two Parameters (see text). 

SO that the probability of detecting them is the same, though A represents a positive 
“ error ” in 02 greater than the negative “ error ” given by B. 

27.14. Whether this is a desirable property of the test depends to some extent on 
what the test is intended to do. To avoid the anomaly we must require that 

(27.64) 

Furthermore, even if this condition is satisfied and the principal axes of the ellipse coincide 
with the co-ordinate axes, there may still appear anomalies if the length of one axis is greater 
than that of the other ; for then errors in one parameter are not detected as frequently 
as errors of the same size in the other. Here again it is a matter of particular circumstance 
whether such an effect is regarded as objectionable. (We disregard the fact that it can 
be removed by appropriate scaling of the parameters, which may or may not be artificial.) 
To remove it we must require that 

^xx = /^22, (27.65) 

so that the ellipses reduce to circles. 

We may refer to the ellipses as “ curves of equidetectability.” 

27.15. With the foregoing explanation in mind we define Wo as a regular unbiassed 
critical region of Type C if it obeys the conditions 

fx («^o) = ^2 {Wo) == 0 (27.66) 

^12 (w^o) = 0 ... . . . (27.67) 

fxx ('*^o) = 1^22 (WJfl) ..... (27.68) 
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and if, for any other region obeying these three conditions and for which 

/5 ( 0 , 0 1 Wo) = /^ (0, 0 1 w) = 1 — a, . 

we have 

^11 (^ o ) ^ ^11 (^)* . . . . 

Secondly, if a region possesses the property that 

1^1 ("^i) “ ^2 (^i) =9 

^12 (^ l ) ^11 ('^ l ) /^22 <9 

and for any other region obeying the conditions 

(S (0, 9 I Wi) = jd (9, 0 1 w) == 1 — a 


. (27.69) 
. (27.70) 

. (27.71) 
. (27.72) 

. (27.73) 


^11 ('W^i) ^la ('^^i) ^aa ('^^i) /27 741 

i^ii (w) /3 i 2 (w) ^22 (w) 

we have 

^11 (w^x) > (^) (27*75) 

we shall say that Wi is a non-regular unbiassed critical region of Type C. 

These equations are analytical ways of saying that the regular region of Type C is 
the one, among all regions having circular curves of equidetectability, which has the smallest 
radius for any given value of the power function ; whereas the non-regular region of Type C 
is the one, among all regions having similar ellipses of equidetectability, which has the 
smallest axes. 


27.16. We now state without proof theorems similar to those demonstrated above 
for the case of a single parameter. 

Write 


Vjk 


■ 

dd. 


fi,==ej==o 


etc. 


Then Wo is a regular unbiassed critical region of Type C if 
(a) inside ujq 

Pii > (Pii — ?^2a) + ^"2 Pi 2 -h ^"3 Pi + P 2 + ^'5 Pj • • (27.76) 

and outside w^, the inequality is reversed — 


(5) 


f dx I 

J Wa J 1 


PlZ 


dx 


[ (P 


11 


dx = 0 , 


1 , 2 , 


(27.77) 


Secondly, if Wy satisfies the conditions 
{a) that inside Wy 

pii > fci (yi2 Pii — 7ii P12) + (722 Pii — Yii P22) + Ih Pi + ^‘'■4 P2 + ^5 p (27.78) 

and outside Wy the inequality is reversed, the lea as usual being constants and the y’s obeying 
the conditions 

7x1 > 9, yl-i — 7 ii 722 < 9 ; 


[ Pa dx — \ (712 Pii — 7x1 P12) ” 1 (722 Pii 7x1 P22) 9, 

JtCi Jwi 


(P'j \ — \ (v,. — V-.. r)-,o) dx — I (Vaa Vll — 7 x 1 ^ 22 ! (27.79) 

then Wy is a non-regular unbiassed critical region of Type C, having ellipses of equidetecta- 
bility determined by 

7 x 1 01 + 271.2 Oy O 2 + yi‘i 02 = constant. . . • (27.80) 
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27.17. The theorem of invariance of 27.9 no longer holds in general for the present 
case. If we transform to new parameters and fhe equations of transformation 

= III + III de... 

etc. will not transform an ellipse co-axial with the co-ordinate axes 0i, 02 into one co-axial 
with Cl, Ca- Thus, in general, the effect of a transformation is to make a regular Type C 
region into a non-regular Type C region. 


27.18. As usual, the conditions for the Type C region may be simply written in terms 
of the derivatives of logp. Write 


= 


ddj 


log^ 


0i = Oa = O 


02 log_p 

Lirar. 




. (27.81) 
. (27.82) 


Then if 

4>jk = Aj, -f- B,, 4- G,, ^2 (27.83) 

we shall have 

Pjk ~ (4‘j -f- Rja- (f>-z) 2^ - ■ • (27.84) 

and the inequality (27.70) becomes 

(1 — ki) (j>\ — ■ h'i <f)i <f>-i k-i <l>2 - k-^ 01 — ' k'l 02 — 0 . . (27.85) 

where the k' are new constants easily expressible in terms of the old. They must be deter- 
mined so as to satisfy (27.77), which reduce to 

f 0^ p dx - f (01 02 + A, 2 ).P dx = f { 0‘f -- 0:^ + (All -- A.,.P)]p dx 0. (27.86) 
J Wo -1 W’o J Wo 


Example 27.4 

Suppose we have a sample of Wi from a normal population with mean //.i and unit 
variance and a second sample of n-a from a normal population with mean //a also unit 
variance. The simple hypothesis to be tested is /Hi = = po, where //q is some specified 

valub. We consider two cases : — 

(i) in which errors of the same size in p^ and //a are equally im{)ortant ; 

(ii) in which, for some reason, there is a stronger desire to avoid errors in p., than 
in Pi and that therefore a greater number of members has been taken in the second 
sample. We also assume that the sizes of errors judged of equal importance are 
inversely proportional to '\/n, so that we are led to consider new parainetc>rs — 

Pi — {pi — Po) V^I, P'Z = {P‘i ~ /^o) \/w2 • - • (27.87) 


Case 1. — The frequency function is 


p oc exp 

It will be found that 


n i + n a 




11 


'^1 (^1 /^ o ) 3 < f > 2 i — ^2 (^2 f ^ o ) ; 


A... 
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From (27.85) we then find 

(1 — kj) nf {xi — jLio)- — k^ Ui (xi — /Zo) (x^ — ^Wo) + (^2 — /*o)‘^ 

— k\i Tlx {Xi — /Zo) — k'l Ut (xn — /.to) — k^ > 0. . (27.88) 

The law of distribution of Xi and x^ may be written 

oc exp [ — I {tlx (i»i — /^o)‘^ + (^2 — f-iaY] ]■ • . (27.89) 

Put u = 's/ux {Xx — fZo) and v = (^2 — /«o)- 

Then the region Wq is determined by 

(1 — kx) tlx t(^ — k-y uv VC^i ’^a) + ^2 t)^ — k-i ti^tix — k'i v-\/n .2 — k{ > 0 (27.90) 

where 1 p {u, v) du dv — 1 — a 

J Wo 


u j) {u, v) du dv — \ V p (u, v) du dv = \ uv p {u, v) du dv 

J M’„ J Wo J Wo 


0 


f {n^ 

J l/'o 


ll- 


v'^) p {u, v) du dv = (1 — a) {Ux — n^) 


. (27.91) 
. (27.92) 


and 


p ill, v) = exp { — I (^^“ + w-)}. 


It is evident from (27.90) that in the {u, v) plane the boundary of iv^ is a conic. From 
(27.91) we see that it must be coaxial with the co-ordinate axes and have its centre at the 
origin. Hence k~x = k'-, — k\ = 0. Finally from (27.92) we find that the boundary is 
of the form 

“! + ”! = 1 (27.93) 

n/ b- 


wlier(‘ 


1 __ (I - kx) 1 ^ n. 

«“ /4, ’ /t’r, 


(27.94) 


The Typo C regions are then defined by (27.93), but we have to express a and b in terms 
of known constants, including the probability level 1 - a. We have to satisfv (27.92)^ 

and will show that a solution always exists. 


Ihit 


F (e, h) 



tio w‘^) exp { h {u‘^ + v^) } du dv 


{tlx - ^^2) (1 - a). (27.95) 


If the boundary of /e,, a circle, its radius is easily found to be 

a, 6 = y {— 2 log (1 — a) 

The integral F (u, b) outside this circle, by the substitution u ™ r cos ?/', v — /• sin ?/i, is 
found to 1)0 


F (a, a) {ux -- tio) - [ u^ exp | {u- + e-) j- d 

27rjn« + r=>a= 

= (1 — a) {tlx ~ w-d 

Now taking u’o as the space outside the parallel lines 

V = ± A, 

2 f’’ , 

which is given by a infinite, so that --77- - 1 e dx 1 

\/(27r) J ;i 

A.S. — VOL. II. 


dv — {tlx — tio) (1 — a) 


a. 
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F (oo, A) = — (%1 — Wa) (1 — a) + ~ J exp {— 1 (^ 42 + ^ 2 ) j. 


r 


Similarly, 


77^ r* 

— ^ J exp { — I (^(,2 -p 4;2) I 

, /2^e-P^<;0. 

\/ :77: 


F (A, od) = Til /—A e~^^“ > 0, 

•\/ Tt 


Thus, since F (a, 6) is continuous it must vanish somewhere in the range A < a < 00 , 
A <6 < CO. The values for which it does so define the Type C region. 

Case 2. — In this case, using the parameters rji and of (27.87), we find 

<f>t = u, <1)2 — V 

^11 = — 1, = 0 , (j^aa = — 1 . 

The inequality becomes 

(1 — ^ 1 ) — k^uv + — k'^u ~ kl v — k'^> 0 , 


where 


{u^ — v^) p (u, v) du dv = 0. 

J w„ 


In a similar way it follows that the Type C region is the one lying outside the circle 

^2 _j_ ^2 _ _ 2 log (1 — a). 

We leave the verification of this result to the reader. 


Certain Limiting Properties 

27.19. From the foregoing examples it will be seen that in certain cases the optimum 
critical regions are by no means easy to determine numerically ; and it is not always clear 
that the labour involved is repaid by the results. Some consideration has been given by 
various writers to tests which have optimum properties for large n, the presumption being 
that the same tests will be good, if not the best, for small values. As usual when several 
limiting processes are involved simultaneously, the rigorous enunciation and proof of 
theorems in this field is a matter of some complexity, and we shall here merely indicate 
some of the results in very general terms without including proofs. 

It has been shown by Neyman (19386) that there do exist tests which are unbiassed 
m the limit, and rules have been given for finding them. It has also been shown by Wald 
(1941a) that there exist tests which are most powerful in the limit, and that such as are 
based on maximum hkelihood estimators are of this class. The tests are uniformly most 
powerful for the single parameter 0 > 0o and for 0 < 0o, but not both ; and for any range 
they are the most powerful unbiassed tests in the limit. Furthermore, the Type A test 
tends to the most powerful unbiassed form. 

The general conclusion seems to be that, even where the variation is not normal, most 
of the tests in current use which are based on likelihood estimators have optimum properties 
in the lunit, and may therefore be used confidently for moderate or large samples. For 
small samples the position is not so clear, particularly for non-normal variation. Tests 
based on inefficient estimators are presumably less satisfactory ; and for the non-para- 
metric case there is as yet no complete theory. On this latter question reference may be 
made to a useful review by Scheffe (1943). ' 
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The Unbiassed Character of Likelihood-ratio Tests 

27.20. It is of some interest to consider how far the tests based on likelihood (26.35) 
are unbiassed. 

It has been shown (Pitman, 19396 ; Brown, 1939) that the Neyman-Pearson test in 
the problem of k samples based on is biassed unless all the samples are of the same size ; 
but that Bartlett’s modification (26.42) is unbiassed. We prove this in 27.25 below. 
On the other hand, Daly (1940) has shown that in certain multivariate tests such as those 
of regressions, multiple correlations. Hotelling’s T (which we introduce in the next chapter), 
and the ordinary analysis of variance and covariance for orthogonal or non-orthogonal 
data, the likelihood-ratio tests are unbiassed, at least in the Type A sense (i.e. locally) 
and in some cases completely so. 


Pitman’s Method for Location and Scale Parameters 

27.21. In the special but not uncommon case where the hypotheses under test con- 
cern parameters of scale or location, a simplified approach is possible. Suppose the joint 
distribution of k sample-values is 


dF =-f{x^~ 0„ Xa - O 2 , . . . X„ - 6j,) dx^ . . . dx^. . . (27.96) 

We seek for a statistic J, independent of the 0’s, to test the hypothesis ; and clearly, if the 
test is to be satisfactory, J must be independent of the origin, i.e. must be semin variant. 
The test that the 0’s are all ecjual is then equivalent to testing the hypothesis 

0, == 0., = . . . = 0/, = 0. . . . . (27.97) 

Witliout loss of generality we may suppose the hypothesis rejected if J is small and less 
than some quantity depending on the acceptance value a, and we may also suppose J 
positive ; for if either condition is not satisfied we can transfer to some other function of 
J for which it is. 

In the sample space W, J must be constant along the line = a;^ ~ . . . = = con- 

stant, and therefore the critical region Wq will be the one lying outside a hypercylinder 
whoso a.xis is |)arallel to this line. When //„ is true, the probability of rejection is then 


dF (xj^ . . . Xf^'j — 1 cc, 
and when it is not true the probability is 


L 


(27.98) 



-■0 


l5 ‘ • 






where ir is merely derived from W(, by a translation in W without rotation, 
parallel to xq = . . . — X/. ~ 0, we write 


. (27.99) 

If L is any line 


P (L) dF (.xq . . . X,) 

- / (Xi . . . X/,) drj 


(27.100) 


where 


7/ 


\/k 


y. (■ 


(x) ; 


and ry is thus the distance of the point (Xj 


X;;.) from the plane L (x) = 0. 


. (27.101), 
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Now if Wq is defined as the locus of all lines for which P (L) > h, a constant, P (L) will 
be less than h on any L which is in w but not in Wq. Hence 

{ dF>\ dF, (27.102) 

JWq 

and so the resulting test is unbiassed. Thus an unbiassed test is given by choosing J so 
that at any point of a line L it is equal to P {L) at that point. Now we may write for the 
variable co-ordinate on a particular L, say 


where 

Hence 


Taking 
we find 


t = \s{x) 


JL 

V^* 


(L) = 


/ (aji -t, x^— t, 




t) dt. 



which gives us an unbiassed test. 


. (27.103) 


. (27.104) 


Example 27.5 

Consider the case where the variables are distributed normally with unit variance. 

/ = exp {x^ - OjY }. 

(27r)2 

Then we have, from (27.104), 

1 f” 

J — ^ exp {— IE {x.j — t)^ } dt 

(27r)2'*' 

_ 

Vk 

where S = E (x — x)^. 

In practice we should take S as our criterion, not J, and reject the hypotlu^sis that 
the means were unequal if 8 exceeded some fixed value determined by a. We obscrv(‘ 
that in fact 8 is distributed as with k — I degrees of freedom when //„ is true, so t hat 
this value is easily ascertained. 


'll .El. Consider now the case where the frequency function is 

__ f(^_l \ 

(9i 02 . . . \0i 0/^.y 

If the x’s are positive in range we put 

Vj = log xp cl>j = log dp 
and for the frequency function of the y’s we find 

exp(ry — E j>) f {e^^~'^\ _ ev*-h). 


. (27.105) 


. (27.106) 


. (27.107) 
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This reduces to our first case, and we have an unbiassed criterion that 

<f>l = (j>2 = . ■ • = <f>Jc 

by putting 


J = I exp {Z y — Id) f 

J — oo 


dt 


n Xj 

j=i 


/ 


/y* ry* 

1 




. (27.108) 


t’ t’ ' ' ' t 

When the d:’s are not necessarily positive the expression remains the same, except that in 
(27.108) n {x) becomes II {\x \). Small values of J are significant. 

27.23. Suppose now that our hypothesis asserts the equality of 0’s or (j!>’s and 
states that they have a common value 0o or ^.s the case may be. Then if we take 

/ fc \ 


J' = [ n xA ]f{xi. . . xA, 


. (27.109) 




the test will be unbiassed. Moreover, if we regard small values of J' as significant and the 
.r’s are independent, and if each frequency functioil is unimodal, then when 

01 = 0.j5 = . . . = = 00 

is not true the probability that J' exceeds the specified limit based on 1 — oc increases as 
any 0 tends to 0,,. J' therefore provides an unbiassed test. 

27.24. Finally, consider the ease of h variates each distributed in the form typified by 


dF = ^ exp 

h A (m,.) 


3: 


a;, 


Jll 


dx. 


(27.110) 


Their joint distribution is 


77 


dF = 


/ x ' 

u, 


- 1 


exp 


z ~)n dx 

9/ 


n{cf.r{m)} 

Hence, to test the hypothesis that the samples have the same ^ we have 


(27.111) 


J 


j: 


Q—S(x)/t. 


n {x*A 
ir{r (m) 

w'here M = Z (m), ■ 

_ r (M) II {x>A 
" II {FArt^)] ' {ZAa^' 
It is sometimes convenient to deal with 


dt 

^+1’ 


K 


// (a-'" ) 


{ZxY^' 

wldcli difiers from J only by a constant factor. 
The maximum value of K is 

n (w'"') 


(27.112) 


(27.113) 


and we x^ut 


L 


log A 


log max. K 




=: M log 


x 


M 


Z 


( m log - 


V 


X 

m 


(27.114) 


L is essentially not negative, and large values are significant. 
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For testing the hypothesis that a set of variances have some specified equal value, we 
find similarly from (27.109) 


L' =^S{x)- M - E 



(27.115) 


27 . 25 . The foregoing result has an immediate application to the case of k normal 
samples, for the variances are then distributed in the Type III form of equation (27.110). 
The criterion L becomes 

-i7(»;iog^Y , . .(27.116) 

where v as usual represents the number of degrees of freedom and N = E (r). This, as 
will be seen by comparison with (26.93), is equivalent to Bartlett’s test, and shows that 
it is unbiassed. 


NOTES AND REFERENCES 

For the theory of unbiassed tests see particularly Neyman and Pearson (1936 ; 1938) 
and Neyman (1935^). Regions of Type B have also been considered by SchefFe (1942a), 
who discusses a Type Bi standing in relation to B as Type Ai to Type A. 

For limiting properties see Neyman (1938&) and Wald (1941a). 

See also references to the previous chapter. 


EXERCISES 


27 . 1 . Show that the test of Example 27.1 provides regions which are of Type Aj 
as well as of Type A, and that the test is a U.M.P.U. one. 

27 . 2 . Show that the cumulants of the distribution of L of (27.114) are 

Ki = M (Gj, {M) — log M] — E [w [m) — log m] ] 

K, = {~ ly {Em^ G, (TO) - G, {M) }, r > 1 


where 

Hence show that the cumulants of 


fjr 

G, = -±- log r (m). 
' dm^ ^ ^ ' 


1 +/? 
1 


are approximately 

r i\ n 


r'(r), where 


2L . 


6 (A: - 1) 


TO 


M V 


and thus that ^ is distributed approximately b.s with k — 1 degrees of freedom. 

(Bartlett, 1937c ; Pitman, 1939&.) 
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27.3. Show that in samples of 3 from a normal population the distribution of the 
range r is given by — 


dl 


6 


a^/jt 


7^ 

e 4cr“ 


fave 1 


L 


V(2jr) 


dy dr. 


Hence that an unbiassed critical region of Type A is given by 


the region lying outside fi <r <r 


1 

I 

fve 

e 

dy 


0 

L 

J 0 

- 

n 


rh 



A 


ir? V6 Q~lv^ 

dy = 

fa 

1 I VC 

e“ 

Jo 



Jo 


Ir <r2- 






(Neyman and Pearson, 1936.) 



CHAPTER 28 


MULTIVARIATE ANALYSIS 

28.1. We have already considered some aspects of the case in which each member 

of a population is characterised by several variates • Xp. For instance, we have 

examined the measurement of correlation between the variates and the regression of one 
variate on some or all of the others. In this chapter we shall extend our inquiries into 
the multivariate case a good deal further, mainly by taking into account the possibility 
that different sample-members may have emanated from different populations. This 
will lead to some generalisations of the methods already discussed for the univariate case, 
such as tests of homogeneity and tests of differences between two samples. Some of our 
known results generalise with nothing more than additional mathematical complexity; 
but in others certain new features appear, and the theory of multivariate analysis is not 
entirely a matter of generalising univariate results to p dimensions. 

28.2. One or two examples will illustrate the kind of problem with which we are 
concerned. A number of skulls are discovered in a burial-ground. They are found to 
vary among themselves in the manner usual in biological material. Is the observed varia- 
tion consistent with the hypothesis that all the skulls were derived from members of the 
same race or does it suggest a mixture of racial types ? If heterogeneity is indicated, do 
the skulls fall into two well-defined categories, such as we might expect if the burial-ground 
were the site of a battle between two races such as Saxon and Celt ; or are there several 
types such as we should expect in the normal burial-ground of a town where races were 
living together and interbreeding ? Or again, if the skulls are compared with another set 
known to have been buried at a much earlier time from the same race, is there any evidence 
of a significant change in skulls from one period to the other ? 

There is no single measurement on a skull which is marked out from the infinite number 
of possible measurements for deciding questions of this kind. It is quite common for 
thirty or forty measurements to be taken by craniometricians on a single skull. Even if 
we reject many of these for practical reasons, leaving out the jawbone, for instance, because 
it is often separated from the skull and cannot be identified, we shall still be left with a 
number p which require consideration. For n skulls we shall then have n sets of p values 
corresponding to variates which are, in general, correlated among themselves 

and may be highly so. Our problem is to test the homogeneity of these values, or to esti- 
mate differences between parent populations from which they were derived. We may, 
of course, apply methods which are already familiar by picking out one variate and testing 
for homogeneity. But we might pick out quite an unsuitable one and sacrifice most of the 
information. Even if time permits we cannot take each variate in turn and test it because 
the variates are correlated and our p tests are not independent. 

28.3. Again, suppose we have two different breeds of laying hen and are given a 
batch of eggs from the hen-run without knowing which hen laid which egg. We require 
to allocate the eggs to the two breeds. Assuming that there is no decisive criterion such 
as colour of shell, we may measure various properties of the eggs such as length, breadth, 
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weight, volume, ypecific gravity and so on. Some of these measurements will be highly 
correlated or, in the extreme case, perfectly correlated, as with weight, volume and specific 
gravity. In such circumstances we may reject some variates as redundant ; but in general 
we shall be left with several sets of measurements. Our problem is to find some method 
based on the retained variates for allocating the eggs to the correct parent breed. In 
particular we might search for the best linear function of the variates to discriminate between 
breeds and to enable us to assign the eggs with the maximum probability of correctness. 

28 . 4 . Throughout the whole chapter we shall, except when the contrary is stated, 
assume that the variation is normal. In addition, to render our formulae a little less 
cumbrous we shall borrow a summation convention from the tensor calculus. If the 
affixes i, j range from 1 to 2 ^ we shall write 

P V 

= 2^ • • • • . (28.1) 

the affixes to A being regarded as ordinary superscripts, not as powers. Similarly we 
shall have 

V 

A>^ an,= a.a, (28.2) 

Whenever an affix occurs as a superscript and a subscript, summation is to be understood. 
Clearly the actual letter used is a dummy and we have, for instance, 

A^j a^,j ==-- A’^j a,,j = A’^'’- a„i. . . . . . (28.3) 

We shall write the array of values A'-> (a square matrix) as (A'-') and its determinant 
as I A^-^ I or simply as | A |. 

To every matrix (a,^) with a non-vanishing determinant there corresponds a reciprocal 
or inverse matrix which we may write Since 

(«;j) («’"'’) 1 , 

we liave, on carrying out the multiplication, 

a.j a'* = 1 , J == Ic 

== 0, j ^ k, 

which we may express as 

a,.j a^'-> — dj, . . . . . (28.4) 

wliere one form of the Kronecker delta, is zero if 5 =^ h and unity otherwise. The quan- 
tity is the minor of in | A [ divided by | A | itself. 

28 . 5 . It will further sinqrlify our formulae and will give rise to no loss of generality 
if we su|)})<)se our variates to be in standard measure, that is to say, to have zero mean 
and unit variance. If we require results for the more general case we can easily obtain 
them from transformations of the type 

It H- (28.5) 

With this convention the equation of the multivariate normal distribution (cf. 15 . 12 , 
vol. I, p. 376) may be written 

dF — ^-^7 6^23 ( — Xj) dxi . . . dxp, . . . (28.6) 
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where the A’s are related to the correlation determinant 

^ = I ft, 1 (28.7) 

In fact is reciprocal to (p^j), as we saw in 15.12. 

28.6. We shall also frequently refer to the matrix of sample variances and covariances 
which we shall call the dispersion matrix and write as (ay)? where 

1 ” 

“ X ~~ 

^ i, j = l 

This, it is to be remembered, is in standard measure for the population, that is to say the 
observed variates are taken from the parent means and divided by the parent standard 
deviations. 


Wisharfs Distribution 


28.7. We now proceed to generalise to p variates the joint distribution of dispersions 
arrived at in 14.12 (voL I, p. 339) for the bivariate case ; and we shall also show that 
the distribution is independent of that of means. The result and method of proof are 
due to Wishart (1928). 

Eirst of all let us write the result for the bivariate case in our new notation. Eor 
the distribution of means we have 


dF 


n I A 


n 


-^expi 2-- 


A^^ X, Xj ) dxi dx^, 


ij-h2 


. (28.9) . 


and for that of dispersions 

. n—l 


dF 


n 

2 


^ I K7^-l) 


Jt' 


ir 


n — l 


Fi 


^ ^ exp ^ ^ ^<^12 ^^ 22 - (28.10) 


For instance, we have 

0/11 ^1? ®12 ^ ^22 



SO that (28.10) is equivalent to 


dF = 


n 


n — 1 




(1 — r'^y^ sf~^S2~‘^ 


X exp 


n 


This, with the substitution 

Fi 


n — 1 


)"( 


2(1 -p2) 

— 2 \ _ F (n — 2) 
^ ) ■; 2 n~'^ 


— 2lpTSi ^2 ^ 2 ) ( • 


is the form found in equation (14.44), vol, I, p. 342, when it is remembered that we are 
working in standard measure. 
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28.8. Now consider the general case. With a sample of n values of p variates we 
consider p rectangular spaces of dimensions each as the domain of variation. If a point 
in one of these spaces be fixed, the variation in the other spaces is constrained for fixed 
values of tlie sample dispersions. The following argument is a generalisation of that given 
in 14.12 leading to the bivariate result, and the reader may like to refresh his memory 
by re-reading that section. 

Writing for the n values of the^'th variate, we have for the density function 

of the whole sample, from (28.6), 


/ 


I A 


r 7^ ' 


k=l 


I A P’ 


n 

- exp [ — ^ N {Xij^ — ^i) {Xjk — ^j) } ] X exp 


We may thus factorise the density function into two parts, 


and 


A 

A 


\ A 1^ 


exp 


I A 


( 


I x^Xj^ 


exp 


n 




“ A. Xj 


. ( 28 . 11 ) 


. ( 28 . 12 ) 

. ( 28 . 13 ) 


where we have chosen the constant factor of so that the distribution shall have the total 
frequency unity. 


71 

Consuler now the volume element 77 dx^jAlx^^k . • . dx„j^.. In anv particular u-space 

/.•-I 

the density is constant over hyperspheres centred at the mean. The volume element may 
then be represented as the product of elements dxj and of independent elements depending 
on dispersions. In the total space of pn dimensions the volume element may thus be 
represented as the product of p elements dxj and an independent element depending on 
dispersions. Thus the volume element also factorises, and we have immediately for the 
distrilnition of means 


dF 


ni'/' I A 
(27r)i'/' 


' exp 


n 

'•) 


A'-' Xj Xj ) 77 dxp 

/ j-i 


( 28 . 14 ) 


showing that the means are distributed in the multivariate normal form independently 
of dispersions. 

If we dcHne a matrix (8) with elements In times those of (A), we may write the dis- 
tribution of means in the simple form 


dF 


B P 

— P-L exj) { — B'-> Xf Xj) 77 dx. 


( 28 . 15 ) 


We note that this checks with the known results for p = I and 2 ^ = 2 . It is also seen 
almost at once that the variance of Xj is Oj /n, as we expect. 


28. '9. We have now to consider the more complicated expression for the volume 
element of dispersions. Let us in the first instance transfer our origins to the sample means, 
remembering that in doing so we have lost one dimension (or degree of freedom) in the 
variation of our sample-points. Let Pi ... Pj, be the sample-points whose co-ordinates 
are the w values of Xi . . . one point P lying in each 72-space. We shall consider in 
turn the variation of Pi, then that of for fixed Pi, then that of P 3 for fixed and P^, 
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and so on. The total variation will be given by multiplying the various expressions so 
obtained ; and' it will be sufficient if we consider the typical case of the variation of P 
for m — 1 fixed points P^ . . . P^_-^. 

For a fixed length OP^ and fixed angles with OP^ . . . 0P^_^, P^ can vary on a 
hypersphere oi n ~ m dimensions ; for, if we fix any particular angle, P^ is constrained 
to he on a hypercone which cuts its hypersphere of variation in a hypersphere of one fewer 
dimensions, and the fixation of the origin at the sample mean imposes a further constraint. 
Further, if we regard the p spaces as superposed, as we may, the centre of this {n — m)- 
dimensional hypersphere is the foot of the perpendicular from P^ on to the space containing 
t e points, O, P^ . . . P^_-^. Call the length of this perpendicular for the time being r^. 

The volume of a ^-dimensional hypersphere of radius r is 


and its surface area, obtained by differentiating with respect to r, is 

(28.16) 

The surface area of the hjTpersphere of variation of P^^ is thus 





(28.17) 


To find the element of volume due to the variation of P^ and the angles which OP. 
makes with OP, . . . OP^_^ we have to multiply (28.17) by an element of variation 
normal to the hypersphere of n — m dimensions. This variation lies in the hyperplane 
determined by the origin and P, . . . P,^ which is, in fact, normal to the hypersphere. 
To evaluate it, consider the transformation 




A 

&-1 


X. 


mk ^jk^ 


J = 1 


m, 


. (28J.8) 


where, of course, the x’s are measured from the sample means in virtue of our choice of 
origin. We have for the Jacobian — 


J = ^ ’ ’ ' ‘=‘ mm') 

^ \^ml ■ • ■ ^mm) 




^12 

* ■ ^Im 

— 

^12 

X 22 



• 

* • 

* 



* 


z= 2^ 

. 



where is the volume (or “ content ”) of the hyperparallelopiped having 
the origin and edges running to the points P, . . . P^. Furthermore, 


. (28.19) 
one corner at 


^mj I 1 ^jk I 

= \ ^^mk 1 ^ 


. (28.20) 
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The required element is thus 




70=1 

and the total element of variation of P.,^, on multiplication by (28.17), is 


7t 


I {n — m) M —m—l m 


r 


n 


m 


V, 


IJ d^rnk' 


. (28.21) 


m 


Now is the length of the perpendicular from on to the space OPi . . , Pm-i 
and is tlierefore equal to Hence, for the variation of P^ we have the element 


71 




V. 


m 
m 


r 


n 


n 


1 


mk* 


(28.22) 


We now derive the total element for variation of P^ . . . P,^^ by ‘multiplying expressions 
of type (28.22) for m = 1, 2, . . . p. The terms in v cancel except and the latter 
being unity, and we find 


Trip (2n— p — 1) m p 

n n dij!,. 

7™1 k—7n 


IT id 

A=l \ 


n — k 




. (28.23) 


Now from (28.18) we have 

^jk — ^'jk (28.24) 

and from (28.20) vf, = | ct |. . , . . . (28.25) 

Making the necessary substitutions in (28.2.3) and adjoining the frequency element given 
by (28.1.3) we (ind, after a little reduction, 

rf), \ iP{n—[ ) 

'> 


4 (« - 1 ) 


a 


A(p— p — 


// r 

k 1 


n k 


oxp ( - 1 


a;; 1 n da. 


(28.26) 


This is Wishart’s generalisation of the distribution of dispersions in a multivariate 
normal s\’stem. Thc^ reader wlio feels that the foregoing jiroof demands too much of his 
powers of gc^oinetrical insight may refer to alternative derivations by Wishart and Bartlett 
(19.33r) or P. L. Hsu (l!).39a). The domain of valuation of the u’s is 0 to oo for and 
eotTes|)on(ling values for i 'A j, such that ciorrelations do not exceed unity in absolute 
value. 


28.10. It must be remembered that we are regarding as the same as aj^ and that 
the product of differential elements in (28.26) contains | p (/> [- 1) items, not p '^ ; for there 
are p ehunents of the form arid }^p {p - - 1) of the form da.f.j, i '!fhe expanded form 

of A'-' ftij, however, takes place over i, j from I to p, so that any particular term such as 
occurs twice, once as A^''^ and once as A'^^ a.ia ; except that when i — j the term 
occurs once. For instance, with p ~ '2 we have 

A'^' aij = All Uii + 2A>^ .g ^*22 . (28.27) 

We can now derive the characteristic function of the Wishart distribution. Ignoring 
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constant factors and writing a single integral sign for summation over all a,-., we have 
from (28.26)— 


j* I a 1 exp ^ — I n da = 


K 


(28.28) 


where K is some constant. In this form let us replace by — -0*^' when i j and 


when i = j. Then the resulting integral is the characteristic function of 
the a’s, 0'^' being the parameter itP corresponding to Uy. We thus have 

^(0o-)=___ 


All _ ? 011 J^12 _ i 012 ^ _ 10135 

n n n 

A12— i012 A22_? 022 A^^ ~~d~» 

n n 


A^» — - 0i» A^^ — i 02p . . 

n n n 


. (28.29) 


the constant being evaluated by the consideration that (0) = 1. 

Example 28.1 

Let us apply these results to an examination of the moments of the distribution of 
covariance in the bivariate case. We have 


All = A22 = 


1 -o2’ 


We then find for the c. f. of an, a^, a 22 - 

1 2011 


A12 = 

1 -p^ 


^ oz 


012 


1 

— P 


n 


01^ 


1 — n 

1 2022 


1 — p^ n 




1 — p^ n 

We are interested only in the parameter 0i2 which we will write as 0, putting the others 
equal to zero. We then find — 


(f> oc 


— P 


_ 0 ) 2 - 
(i-pT {i-p-- «J. 

/ i _ ?£? _ (1 -p^) 9^1 -«»-■) 


-i(n-l) 


n 




Taking logarithms and evaluating coefficients of powers of 0, we find for the cumulants 

n — I 


K-, 


Kn 


KS 


n 

n — 1 


P 

(1 + P^) 


2 (?z — 1) 




/> (3 + p^) 


Ka 


= +6p"+p*). 
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In standard measure the distribution tends to normality as n tends to infinity, 
finite n we have 


/5x 

^2 


4 (3 + 


n — I (1 + p2)3 

6 1 + 6p2 + 


3 + 


n 


1 (l+p^)^ 


But for 


Thus, even when p = 0 our distribution, though symmetrical, is not normal. 

Wishart (1928) has given formulae as far as those of the fourth order for eight or 
fewer variates. 


Hotelling’s Distribution 

28.11. In the univariate case we can test the significance of a mean by comparing 
it with the estimated standard deviation, the ratio being distributed in “ Student’s ” form 
(or some simple transformation of it if we compare the mean with the actual sample variance 
and not the unbiassed estimator). We proceed to generalise this result. 

We require a single quantity which will serve as a measure of departure of all the means 
Xj from the pojDulation values which, as usual, we take to be zero. In place of the matrix 
of dispersions, we shall consider the matrix of sums of squares and products (6,;^-) where 

= 2^ (^ik - i^-jk ~ -L) (28.30) 

/c=-l 

As usual we take (6'-^) to be the matrix inverse to {bjj). Let us now write 

T2 ==%(%- ]) XiXj (28.31) 

This is Hotelling’s generalisation of the Student ” ratio t. 

In the simplest case when p — 1 we have 


and hence 


== ns'^ 



T- 




(28.32) 


so that T becomes equal to the ratio t as rcquii'ed. 


28.12. We have 

— = n b'-> .L x, 
n — I ^ ^ 


. (28.33) 


Let us now denote by the sum of squares or products about the origin, so that 

+ nXfXj. . ..... (28.34) 

The determinant of may be written 


1 

Xi-y/n 

X» -y/n . . 

. Xp-\/n 

0 

bix + nxj 

^12 nx^x^ ' 

■ n x-^Xj, 

0 

bi2 + nx-iXi 


. b^.p + nx^Xp 

* . 

0 

b^2i “1“ nXj^^Xi 

b.,jj “f nXpX^, . . 

• ^-^pp + '^^p 
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On subtracting ■\/n times the first row from the second, and so on, we find- 

1 . . . x^y/n 


Wy 


bix 

6, 


. b 


ip 


. 


Xjp ■\/7h Uxp • • • 

and on expanding according to the border row and column, 

I \=\b.ij\ | 6y 

It follows that 


bij 


or 


n — 1 

1 


1 -- I h,- 


bij 


I + 


T‘ 


m. 


%3 


(28.35) 


(28.36) 


n — \ 

This is a fundamental equation in the samphng theory of T and we proceed to interpret 
it geometrically. 


28 . 1 3 . In the case ^ = 1 we have a single sample space of n dimensions. The numera- 
tor and denominator of (28.36) then reduce to 6ii and — ^that is to say, the squares of 
distances from the sample-point Pj to its projection on the unit vector whose direction 
cosines are all equal, and from Pi to the origin, respectively. The ratio of (28.36) has 
zero dimensions and is in fact the square of the sine of the angle between OPx and the unit 
vector. This is the geometrical approach which gave us “ Student’s ” distribution in 
Example 10.6 (vol. I, p. 239). 

In the general case let us regard the jp ?^-spaces as superposed in one n-space. The 
points Pi . . . Pp will lie in a space of p — 1 dimensions, a hyperplane in the w-space. 
Now we may rotate the axis without altering the functions | Wy | or j 6y j which are easily 
seen to be invariant under orthogonal variate-transformations. If we perform such a 
rotation so as/to bring the (p — l)-space of sample-points into correspondence with p — 1 
co-ordinate dimensions, we see from (28.20) that | Wy | is the square of the content of a 
hyperparallelopiped with one corner at the origin and sides parallel to OPx ■ ■ ■ OPp. 

Now consider a hyperplane perpendicular to the unit vector meeting it, say, in O ' , 
and let Pj . . . Pp be the projections of the points P on to this hyperplane. Then 6y 
is the covariance of the co-ordinates P- and PJ referred to O' , and hence | 6y | is the square 
•of the content of the hyperparallelopiped in the hyperplane. Furthermore, the content 
of this figure bears to that given by | my ( a ratio equal to the cosine of the angle between 
the unit vector and the hyperplane. Representing this angle by d, we have 

=co8^ 6 (28.37) 


28.14. Now if the sample-points P are distributed in the %-space with random 
orientation, the hyperplane which they determine will be distributed randomly in regard 
to the angle which it makes with a fixed vector, and in particular with the unit vector. 
The samphng distribution of 6 is then that of an angle between a fixed vector and a random 
plane. But this, from a slightly diiEferent viewpoint, is precisely the problem of distribution 
which we solved in connection with the multiple correlation coefficient B, for we saw (15.18, 
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vol. I, p. 381) that R is the sine of the angle between a residual vector represented by a 
variate and the space containing other variates ... x^^-, and in the case when 

the former is independent of the latter we can regard it as fixed. Thus, from (28.37) we 
may write — 

1 

— - rpi = 1 - i?-. - . . . . (28.38) 

1 +_i — 

n \ 

The distribution of case when the variate concerned is independent of the 

others is 


n — p p — 1 




(28.39) 


where we must remember that p is the total number of variates and the variates are measured 
from their means in forming the regression equation. Before substituting (28.38) in this 
expression we must increase p by unity, since in effect we are considering 2) + 1 variates 
—the unit vector determining an additional one ; and we must also increase n by unity 
because oiir variation is not restricted to that about the mean, as for multiple correlation. 
With these alterations in (28.39), we have, on substituting for R from (28.38) and a little 
reduction, 

This is the distribution of Hotelling’s generalisation of “ Student’s ” ratio. 

28.15. At the end of the chapter we shall see that this is a particular case of a more 
general distribution (28.31). A third and instructive derivation, due to Wilks, is as 
follows : — 

From the manner of derivation of Wishart’s distribution it will be clear that if we 
substitute the moments about the origin for those about the mean a^j, the distribution 
is the same, except that there is an extra degree of freedom. The distribution is then 

/ I A I , , 


77 p 


|i(« P 1 ) 


n p 1 — k 


^ A>j aij n da'. 


Putting - d'-', we find, on integration, 


I la' 


nr 


/ Yh \ — Ic 


- 1 ) 


exp ( ~ djj) n da' = 


B 


(28.41) 


Now replace % by w + 2r in this expression and divide by the term on the right in (28.41). 
The result is to give us the 'rth moment of | a' | as 


( Yh -1- 1 ■“ k 


Ft ( I a' I ) 


^ r A: -=l 4- 1 — ^ 


. (28.42) 


A.S. — VOL. TI. 
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We may also write the distribution of ct/fj in the form given by our original derivation of 
Wishart’s distribution : — 


dF = 


B \ I a 


B 


nr(^ 


h 


) 


exp (— B^i a^j) U da x exp ( — R'J x^) IJ dx. 

7Z 


Multiply this by ] a' integrate, and use (28.42), transferring constant terms to the right 
as in (28.41) ; then replace n by n -{- 2s and divide by the constant terms as they were 
before substitution. We find — 


/“V s ( I 1/ I « I ) 


p 

n 


Fi 


n + 1 — Jc 


~f- T -j- S 


I B r« 

Now put r = — s and note that 




n 1 — h 


+ s\r 


n — h' 


a 


a 


m 


We find 


fJ'S 


Ai) 


r 


n — p 


+ 5 


) 


m 


r 


B 


(!+■)'■( 


7b — p 


n — p 




B 


n — p p 


2 ’ 2 

Now the function on the right is the 5 th moment of 

1 


dF = 


B 


( n-p p\ 

\ 2 ^ 2 ) 


^J(n-?>-2) (1 _ ^)4(3,-2) 


(28.43) 


(28.44) 


(28.45) 


which is uniquely determined by its moments. This, then, is the distribution of the ratio 

and on substitution in terms of T from (28.36) brings us back to the distribution of 

(28.40). Incidentally this method gives us one more derivation of the distribution of 
multiple correlations and correlation ratios when the respective variates are independent. 


Significance of a Set of Means 

28 . 16 . Suppose that we have a set of h samples with numbers n^ . . . each 
irom a p-variate population. Let us also suppose that the populations have the same 
dispersion matrix but different means, that of the^^th variate in the Zth sample being pi - ,n. 
We proceed to derive a criterion for testing the means simultaneously. Our result is a 
generahsation of the testing of Jc means in normal samples, and we shall obtain it by applying 
the same method, namely by using the likelihood criterion 

^ Po (co max.) 

Pi (D max.) 

as given in equation (26.64). Here oj is the domain for which all the means of the ^th 



SIGNIFICANCE OP A SET OF MEANS 


339 


variate have a common value and Q that for which they have the more general values 

( 0 - 


Let be the function for the Zth sample (Z = 1, 2, . . 

of the ith variate in that sample. Put 

. k) and x.^ the mean 


k 

^ bij (y 

7s=1 

. (28.46) 

where, of course, 

It JL 



rii 

{.D ~ (^it n) (1)) m {?))• ■ 

. (28.47) 

Put, for the functions of the pooled samples. 



» 1 1 ^ 

~ ^ (1) ^ ~ ^ (/) • 
t,l ff' L 

. (28.48) 

If then 

(1) — ^ i^it it) Mi (/)) i^jt {I'j Mj {/)) 

. (28.49) 

. (28.50) 

the likelihood of all 

samples together is 



c 1 A exp {— i X {ni A-' (,)) }, . 

. (28.51) 


where c is a constant. 

Taking logarithms and differentiating, we have for the maximum value equations 
typified by 

L Ui { {x^f — ft I (;)) -f- [Xjf fij (^)) } == 0, 

which reduce to 

;f.,- (,) ~ Pi (i)- ■ • . . - . (28.52) 

The maximum likelihood values of the m’s are then given by 

■mjj bfj . 

Furthermore, the values of A'-' are then given by the inverse of the matrix ^ the 

exponent of (28.51) becomes 

— In Z bij (i)) = — Ink. .... (28.53) 

We then find 

c 


■px {Q max.) 


In a similar way it will be found that 

Po {cti max.) 


n 




|in' 


. (28.54) 


C Q-lnk 


1 , 

^71 

n ^ 



. (28.55) 



340 


MULTIVARIATE ANALYSIS 


Hence 


and we may write 


and take L as our criterion. 


1 J. 




n 


i , 

in 

— 

n ^ 

l 






(28.56) 


28 . 17 . The distribution of L for general k is not easily expressible, but we may 
determine its moments by the method employed in 28 . 15 . The functions are dis- 
tributed in Wishart’s form and their moments accordingly given by equations of the type 
(28.42) with n replaced by n — 1, namely, 


Mr ( ) 




B 


p 

n 

1 


r\ 


n 


m 




r\ 


n 


m 


(28.57) 


Now each is distributed in Wishart’s form, and therefore their sum is so distributed 
(cf. Exercise 28.3). In the manner of 28.15 — we omit the details — it is found that 


Mr 




. (28.58) 


where we now use m as an index of summation, reserving k for the number of samples. 
This gives us the moments of L. 

In the case A: = 2 we have 


Ti 


n — \ 


pfn - p 

I 2 


r 


Mr 


r + r ir 


71 — p — 1 


dF = 


and hence the distribution of L is in the form 

1 

n — p — 1 p 
2 ’2 

\ 

In the case ^ = 3 we find 


Liin-p-i) (1 _ LyAP-^') dL. 


B 



(28.59) 


(28.60) 
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which, in virtue of the relation 

r(x + i)r(x + 1 ) = 

becomes 

. r" (n — 2) r (n — p — 2 2r) 

-f 2r) ■ 

These are the moments of the distribution 


(28.61) 


^ (Vi)*”®-' (1 - Vi)"-' dL, . . (28.62) 

2J5 {n — p — 2, j>) 

a rather unusual form. The results are due to Wilks. 

28.18. The line of generalisation of univariate analysis will now probably be clear. 
Corresponding to most of our results for a single variate there will be a generalised result 
for p variates ; and, in fact, if we like to regard the ^-variate as a vector we can often draw 
direct analogies between results for vectors and those for the (univariate) scalar. It is 
of special interest to observe that the role played by the variance in univariate theory is 
taken over by the determinant of the dispersion matrix in multivariate theory. 

Up to this point we have generalised the distribution of variance (the ^-distribution) 
into Wishart’s form, and the i-distribution into Hotelling’s form. 

Other results which suggest themselves for generalisation are regression and variance 
analysis. But in a sense our treatment of regressions is already general, for we have dis- 
cussed the regression of one variate on p — 1 others. Below we shall go further and 
examine the relations between p dependent and q independent variates. In vector lan- 
guage, we consider the regression of a jp-way vector y on a (/-way vector x. We have also 
considered the analysis of variance for the bivariate and trivariate case in Chapter 24 
under the title of analysis of covariance, and since the interest lies mainly in the direction 
of regressions we shall not take the subject further here, though it is capable of develop- 
ment and even, perhaps, of application if data become available in sufficient abundance. 
In the remainder of the chapter we shall, in the first instance, deal with an offshoot of 
regression theory which has some interesting taxonomic applications, namely discrimina- 
tory analysis ; and we shall then proceed to the general problem of the relationship between 
two sets of variates. 


Discriminatory Analysis 

28.19. Suppose we have p observations for each of 2n sample members, and that 
each member can have emanated from one of two populations, n to each population. We 
require to find some measurement depending on the p observations which will enable us 
to assign subsequently drawn members correctly to their parent populations with the 
greatest assurance of success. For this purpose we shall find p quantities }} ... and 
a discriminant function X related linearly to the variates by 

, X ^ Xj (28.63) 

The criterion on which we shall rely is that the .^’s must be chosen to maximise the ratio 
of the difference between sample means to the standard deviation within the two classes. 

Any linear function of type (28.63) has variance S, given by 

S = A’ , 


. (28.64) 
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where, as usual, aij is the covariance of x.^ and Xj which we assume to be the same for both 
populations. Further, if the difference of the two means of Xj is dj, the difference of the 
function X for the two samples is 

D = (28.65) 


We have then to maximise for variation in X the function 

i)2 _ 

S X'^X^afj. 

This gives for each X 

2 dX ~ D dX' 

leading to equations typified by 


. (28.66) 


. (28.67) 


Multiplying by and summing over i, we have 


X^ a"* 


D 


di u ' * 


or, replacing k by j. 


= X^ = A* ; 

X^ = ^di a'K 
D ^ 


(28.68) 


This determines the /’s, except for the constant — which can be chosen at will so far as the 
discriminant function is concerned. If c is some constant, we have 

A) = c dj a}K (28.69) 

The result also holds if there are members in the first sample and 71 .;. in the second. 

Equation (28.65) remains true, and the rest of the analysis is the same as "for equal class- 
numbers. 


Example 28.2 (from R. A. Fisher, 1936a). 

Measurements were made on fifty specimens of flowers from each of two species of 
iris, setosa and versicolor, found growing in the same colony. Four measurements were 
taken, viz. sepal length, sepal width, petal length, and petal widtli. We denote them by 
.Tj, x^, X 3 and X 4 respectively. 

The means of the specimens were (in centimetres) ; — 


Variate. 

Versicolo7\ 

Setosa, 


5-936 

5-006 

.t ’2 

2-770 

3-428 

^3 

4-260 

1-462 

1 

1-326 

0-246 


Difference 

(V~S). 


0- 930 
- 0-658 

2-798 

1- 080 
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The sums of squares and products about the means were (in cm.^) : 


Xi 




191434 

9-0356 

9-7634 

3-2394 

9-0366 

11-8658 

4-6232 

2-4746 

9-7634 

4-6232 

12-2978 

3-8794 

3-2394 

2-4746 

3-8794 

2-4604 


The inverse matrix is, in cm."^ : — 


.Ti 

X.2 

a-3 

«4 






■ 

0 - 118,7161 

- 0 - 066,8666 

- 0 - 081,6158 

0 - 039,6350 

- 0 - 066,8666 

0 - 146,2736 

0 - 033,4101 

- 0 - 110,7529 

- 0 - 081,6158 

0 - 033,4101 

0 - 219,3614 

- 0 - 272,0206 

0 - 039,6360 

- 0 - 110,7529 

— 0 - 272,0206 

0 - 894,5606 


We need not bother to divide these quantities by n because there is an arbitrary con- 
stant in our discriminant function which absorbs it. The matrices are diagonally sym- 
metric, and it is not always necessary to write out the values below the diagonal as we 
have done here. 

From (28.69), with c = 1, we then find — 

= - 0-031, 1511 = — 0-183,9075 

A3 = 0-222,1044 A* == 0-314,7370. 

If we choose the coetfioient of to be unity the discriminant function is then 

X = X, A- 5-9()37aA - 7-1299:^3 - 10-1036:c,. . . . (28.70) 

The mean of X for vBTsicolor, obtained by substituting the means of the x s for that species, 
is found to be — 21-4815, and that for setosa is 12-3345. The difference is thus 33-816 cm. 
Let us compare this with its standard error to see whether it is significant of real differences 
in the values of X for the two species. 

From the matrix of sums of squares and products we find 

Y var A = A" V a.^ -= 1085-5522, 

where the A’s are, of course, the coefficients in (28.70). N here is the number of degrees 
of freedom of the estimate of the variance. There are 100 members altogether, with 99 
degrees of freedom, but we have eliminated four corresponding to the means of the four 
variates. We therefore take N to be 99 — 4 — 95, and find 

var A = 11-4269. 

This is the variance of a single value. That of the difference of the two means of 50 values 
is obtained by division by 25 and is thus 0-4571, the corresponding standard error being 
0-676. 

The observed difference of means, viz. 33-816, is about 50 times this amount, and 
there is thus a real difference in the values of A for the two species. In other words the 
discriminant function is a good one. It is best among the linear functions of the ic’s because 
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we have chosen it so that the difference of two values, divided by their estimated standard 
error, shall be the greatest possible. To use the function we should, given a flower of 
doubtful species, calculate X for it and assign it to one species or the other according as 
X were nearer to the mean value of X for one species or the other. If, of course, 
the observed value differed from the mean values by more than twice the standard error 
of each, we should begin to doubt whether it belonged to either. 

The analysis may be put in rather a different way. Suppose we analyse the variation 
of X between and within species. The sum of squares between species in the 50 X 2 
classification is 

50 {(Zi -Z)2 + (Z. -1)2 }, 

where Zi, Zg are the respective means and X the mean of the whole. This reduces to 25D‘^. 
The sum of squares within classes is 1085-55 with 95 d.f., as found above, and we have — 


Sum of Squares. 

d.f. 

Between species 

28,588-05 

4 

Within species 

1,085-55 

95 

Totals .... 

29,673-60 

99 


Our method of selecting the discriminant function has been such as to minimise the sum 
of squares within species and, for constant total, to maximise the sum between species, 
and hence to minimise the ratio of the latter to the former. Eor the moment we canriot 
assume that this ratio may be tested in the ^-distribution in the usual way, though we shall 
see presently that this is so. 

28 . 20 . The relationship of discriminatory analysis for two classes and the theory of 
regression may be brought out by introducing a formal variate y for the classes. If thcn-e 
are members in one class and in the other we shall assign the values 






-j- 


Wi -f 9^2 


to the y-variate for the two classes respectively. The mean of y for the whole sample is 
then zero and the sum of squares is 


Considering now 


?^l ?^2 
+ ?^2 


C, say. 


P X. 


as a regression equation, we find for the coefficients X 

E (Yxj) ~ X^ E (x^Xj) = 0, 
E (Yxj) — X^ Xfj = 0. 


(28.71) 

(28.72) 


or 

Now 


. (28.73) 


(Yx.) 




% + 




n. 




■^2 i^j)} 
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where the suffixes of the H’s relate to the first and second classes, 


Ui + 7^2 


{ (^i)i — (%)2} 


= Cdj. 

Thus i:dj =X^a.ii, (28.74) 

which is another way of writing (28.69) with a particular value for the constant c. 


28.21. Pursuing the analogy with regression analysis further, we see that since 
and ^ ~ ^^3 


we may analyse the sums of squares as 

Sums of squares. 

C (1 - d,) 


d.f. 

P 

nx +n2 — p — I 


^ nx + n-i — I ... (28.75) 

as for a regression line. If R is the multiple regression of T on the a;-variates, 

R‘^ = }} d. (28.76) 

In ordinary regression analysis we may test the ratio R^/{\ R“), multiplied by 

suitable constants, in the z-distribution ; but this depends on the assumption that the 
dependent variate y is normal for any fixed ai’s. Here we have the case when the dependent 
variate is fixed but the ic’s are normal. The test still holds in such a case, the reason being 
the kind of duality we noted in 28.14 in arriving at Hotelling s distribution. The distri- 
bution of angles between a fixed plane and a random vector is the same as that between 
a fixed vector and a random plane. Consequently the table of (28.75) can be regarded 
as an analysis of variance and the z-test applied. 

28.22. We may extend the discriminant function to the case when the property to 
be discriminated is not, as above, a matter of allocation to one of two classes, but to several 
which may in particular be determined by certain values of a continuous variate. If we 
have various measurements of p x- variates corresponding to values of a y-variate, we may 
form the regression of y on the cr’s and use the resulting function as a discriminator. As 
in the case of dichotomy, the regression will maximise the difference between classes as 
compared with intra-class variation ; and its significance may be tested in much the 
same way. 

Example 28.3 (from M. M. Barnard, 1935). 

An investigation was undertaken into the changes taking place over time of the char- 
acteristics of certain Egyptian skulls. There were four sets of skulls, known to be from 
Late Predynastic, Sixth to Twelfth, Twelfth to Thirteenth and Ptolemaic dynasties respect- 
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ively, and the relative time -intervals were taken to be in the proportions 2 ; 1 : 2, so that 
the values of t for the four periods may be taken to be respectively — 5, — 1, +1, + 5. 
For the skulls four measurements were selected : 

Xi, basi-alveolar length ; 

Xz, nasal height ; 

Xs, maximum breadth ; 

Xi, basi-bregmatic height. 

It is required to find a function 

X ~ Xi x^ -\- A® X3 -f X^ Xi 

which will best discriminate between skulls belonging to different periods. 

The means of the series were as follows, the sample numbers also being shown : — 


Variate. 

Series I 

K = 91). 

Series II 
(n^ = 162). 

Series III 
{n, = 70). 

Series IV 
K = 75). 

iCl 

133-582,418 

134-265,432 

134-371,429 

135-306,667 

^2 

98-307,692 

96-462,963 

95-857,143 

95-040,000 

Xq 

50-835,165 

51-148,148 

50-100,000 

52-093,333 

Xi 

133-000,000 

134-882,716 

i 

133-642,857 

131-466,667 


The sums of squares and products about the means are — 



Xi 

^2 

aJa 


Xy_ 

X 2 

xl 

Xi 

9661-997,470 

445-573,301 

9073-115,027 

1130-623,900 

1239-221,990 

3938-320,351 

2148-584,219 

2255-812,722 

1271-054,662 

8741-508,829 


The mean value of t, i, for the 398 observations is — 0-432,161, and the values of t —t 
for the four series are accordingly 

— 4-567,839; - 0-567,839; 1-432,161; 5-432,161. 

The sums X Xj {t — i) are respectively 

Xo, 

X3 
Xi 

and finally, X {t — i)^ = 4307-668,32. 

We could obtain the coefficients X from the reciprocal of the matrix above on the lines 
of the previous example. It is also instructive to observe, from the analogy with regres- 
sions, that instead of that matrix we may use the matrix (depending on one extra degree 
of freedom, 395 in all) obtained by adding to the sums of squares the regressions on time. 


718-762,86 
- 1407-260,75 
410-101,94 
- 733-668,32 
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For instance, instead of 9661-997,470 we have 9661-997,470 + (718-762,86)74307-668,32. 
The resulting matrix is 


X.J, 

Xs 



^2 



9781-927,828 

i 

210-762,489 

1199-052,135 

2026-206,952 


9532-849,476 

1105-246,827 

2405-414,318 

• • • 

• • ■ 

3977-363,203 

1201-230,304 



. . . 

8866-382,928 


The reciprocal of this is (units = 10 ®) — 



iU 



UO‘368,975 

6-938,481 

- 28-145,236 

- 23-361,935 


115-693,529 

- 24-948,984 

- 30-767,069 


. » <• 

273-988,409 

- 23-666,691 




129-990,069 


The resulting values of A are 

= 0-075,156,739, A“ = - 0-145,490,050, 

A3 -- 0-144,600,884, A‘’ - - 0-078,538,419 

and these, or constant multiples of them, give us the coirstants in the discriminant function 
which will best enable us to assigTi a skull to the correct period by measurements of the 
four specified variates. 

In this analysis we have 398 members, but of the 397 d.f. we have discarded two with 
the general mean. The cl.f. of the sum 4307-6683 - Y {t -- i)'^ are 395, of which four are 
attributable to regressions on tlw' other variates. For tlie contribution of these four we 
have 


X'- X 71 8-762, S(i 1- etc. = 375-6657. 
The analysis of variance is thus - 


Sum of S(:j|LHu-o.s. 


d.f. 


Quotient. 


Rogrossiou 
i Roniaindor 


375-6(>r)7 

3932-()02(> 



10-0563 


Total.s . 


4307 -(iOSS 


395 


The analogy of the discriminant function with regressions noted above may be used 
to provide standard errors of the coefficients A. In our present case the variance of A^ 
is obtained by multiplying the remainder quotient, viz. 10-0563, by the term corresponding 
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to xj in the reciprocal matrix of sums of squares of the aj’s, namely 110-368,975 x 10~®. 
This gives a standard error of 0-0333. We obtain finally 

= 0-0752 ± 0-0333 

A2 = — 0-1455 ± 0-0341 

= 0-1446 ± 0-0525 

A4 = _ 0-0785 ± 0-0362. 

All coefficients exceed twice their standard error, and hence all the variates are useful in 
discriminating between skulls of different periods. 

I am indebted to Dr. M. S. Bartlett for the calculations of this example. His results 
differ from those reached by Miss Barnard in her original investigation since she took an 
unweighted regression of the variates with time, whereas he has weighted the values 
according to sample numbers. He also notes that the significance of the results has been 
tested above on the basis of variability within classes, but that a fuller analysis of the means, 
bringing back the two degrees of freedom discarded, reveals further differences between the 
series. Thus, though the discriminant function will efficiently sort the series examined in 
relation to their periods, we must be cautious about associating the observed differences 
with the time-changes. 


Canonical Correlations 


28 . 23 . We now turn to consider the general theory of the relations between two 
sets of variates Xx ... x^ and x^^^ - - • where we suppose that < g. Following 

Hotelling (19366), we shall show that in general there can be found linear transformations 
to variates $x . . . . . . 1^+^ such that 

(а) all the ^’s have unit variance and zero mean ; 

(б) any ^ in the jp-group is independent of the other ^’s in that group ; 

(c) any $ in the ^-group is independent of the other in that group ; 

(d) the correlation between any $ in the p-group and any ^ in the g'-group is zero except 

for p correlations px ... p^, which may be taken to be the correlations between 
and ^2 und ^^-1-25 - • • Sp 3'nd ^2}?- 

The variates | are then said to be canonical variates and the p’s canonical correlations. 

This part of our work is, fundamentally, the reduction of two quadratic forms and an 
associated bilinear form to canonical types and does not depend on the distribution laws 
of the variates. Furthermore, the reduction can be carried out either on the population 
or on the sample. In the latter case it will yield sample canonical correlations wliich may 
be written rx ... r^ and regarded as sample-values of the parent p’s. 

We will suppose that our variates x have zero means and dispersions denoted by Ofj, 
where, for the time being, we use a to denote a variance or covariance instead of the more 
usual Those dispersions in the ^-group we denote by Greek affixes : <7^^, and those 
in the g'-group by Roman affixes : For a covariance of a 2?- variate with a (/-variate 

we write one Greek and one Roman affix : cr„^. 

Consider now a particular pair of variates given by 


c“ x^ 


V = x^. 

If their variances are unity we have 


c^ o 


a = 1, 
a = 1, 


p\ 


Oab - 1 


(28.77) 


. (28.78) 
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\\v will also impose the condition that their correlation R is stationary for variations in 
the e(K“tTieients c and d, i.e. that 

R = c"- = stationary. .... (28.79) 

ICtjuatioiis (28.78) and (28.79) then require an unconditioned stationary value of 

(T„5 ... (28.80) 

/ a!i(l // are undetermined multipliers. This leads to 

f f “ “I 

<1“ — 

Multi[>lyin,u llie first equation by d^ and summing and the second by c“ and summing, 
\v(‘ have*, in virtue of (28.78) and (28.79), 

R =A=iu. (28.82) 

Fajuations (28. 8J) will then be soluble for the p q unknowns c and d if the determinant 
of t heir array vanishes, that is if, writing A for the constants /t and 2., 


?Mu 



• ^1, P+Q 



• • • 

^v, e+ 1 * * 

^p,p+q 

= 0 

i 1 . 1 



• p + Q 


1 

p 


, ^^p + Qt p + Q 



. (28.83) 

n.n (Miuatioa de.tn.rmining A. Before studying it further we will throw the equation into 
a somewhat different form. 


28.24. We mav write (28.83) as 



1 


- Xoij 1 


0 . 


. (28.84) 


Multiplying the tirst p rows by - A and dividing the last q columns by - A we find the 
4M(uival(‘nt form 


(- A) 




t 

^(xj 

1 

1 

% I 

^ij 


0 . 


. (28.85) 


Writing, ill conformity with our usual notation, foa) for the matrix inverse to (cr,.j) and 
remem b(‘ring that 

cru = di, 


let us multiply (28.85) on the left by 


"y 


0 

qU 


. (28.86) 



350 


MULTIVARIATE ANALYSIS 


The product of determinants is then 


X^G^y - Gip GyJ, ; 

( 3 “ - G^k ^ 

0^^ G.ip 





Pv 


^yk 


a 


which gives 




0 






Q-P 


Pa, 


Py 


a.ip a. 


yh 


0 . 


a determinant with p rows and columns multiplied by a power of X. 


(28.87) 


28.25. Returning now to our original problem, we see that if a simple root of (28.83) 
is substituted in (28.81) the c’s and d’s are determinate, except of course that they may be 
replaced by c and d. Eor a root of multiplicity m they are determinate except for 
^ ^ assignable constants, a result we take without proof from the theory of algebraic 

forms (reference may be made to Hotelling’s paper for details). 

Prom (28.87) we see that the equation in X has 'P roots. It cannot have fewer, 
for the coefficient of the highest power of X in (28.83) is the product of two principal minors 
which do not vanish unless the variates are linearly dependent, a case which we exclude 
from the discussion. Of these p -Y q roots q — p are zero. The remaining 2p can be 
grouped m pairs, each of which is the negative of the other. There are thus" roots which 
we may write ^ p^. We choose as the roots those which are not negative and 

proceed to prove that they are the canonical correlations as we have defined them That 
they are, in fact, correlations follows from (28.82). 

Suppose we have a root p^ and determine the corresponding constants c.. and and 
hence a pair of variates and Then we have, from (28.81), 


S' 


G 


y ^oLU 


Py S 
Py ^y O'aA 


(28.88) 

Similar equations obtain for a second pair, say and 77,. Between these foui- variates 

there are six correlations, two of which are p, and p,. We wish to show that ti.e otiior 
four vamsh. They are 

f !!’■ ^'1 1 1 i ‘ 

^ Vd) — Cy G^ E (|^,J 7/,,) =3 dy . 

Multiply the first of (28.88) by df and sum. Using (28.89), we have 

^ i^y Vs) = Py E {rjy 77 ,). .... 

Similarly from the second of (28.88) multiplied by cf, 

E{^sVy)=PyE{^y^,), . . . _ 

Interchanging y and <5 we find from (28.90) and (28.91) 

Py E iVy Vs) = PsE iiy 

Equally, again interchanging y and d in (28.92) we have 

PsE iriyTjs) = pyE iiyS,). 


(28.89) 


(28.90) 


(28.91) 


(28.92) 


(28.93) 
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Thus, unless — p|, 

E 1^) = E i7]^r]s) = 0. . . . . (28.94) 

It follows from (28.90) and (28.91) that the other correlations also vanish. 

We have only to round off the proof by showing that if p is a root of multiplicity m 
the property still holds. This follows from the consideration that we may then choose 
our c’s and d’s to obey certain orthogonal conditions ensuring that 

E (iy ie) + E {rjy rjs) = 0. . . . . (28.95) 

It will then follow from (28.92) that each expectation vanishes unless p^ = p^j = 0 ; and 
even in this case, (28.91) and (28.92) show that two expectations vanish, and we may then 
choose our assignable constants so that the others vanish. 


28 . 26 . 


When the variates are put into canonical form the dispersion matrix reduces to 


1 

0 

. 0 

pi 

0 

. 0 

. . 0 

0 

1 

. 0 

0 

^2 - 

. 0 

. . 0 

0 

0 . 

. 1 

0 

0 

- Pll 

. . 0 

pi 

0 

. 0 

1 

0 

. 0 

. . 0 

0 

p2 - 

. 0 

0 

1 

. 0 

. . 0 

0 

0 

- pj, 

0 

0 

. 1 

. . 0 

0 

0 

. 0 

0 

0 

. 0 

. . 1 


with a determinant equal to 


(1 (1 


(1 - Pi)- 


(28.96) 


Example 28.4 (from Hotelling, dealing with data of T. L. Kelley). 

140 seventh-grade school children wore given four tests in (r/) reading speed, (6) reading 
power, (c) arithmetic speed, and (d) arithmetic power. It is required to find canonical 
variates for the two i-eading tests and tlic two a.i-ithmetic tests. 

The correlations between the variat(^s were — 









•^'2 




l-OOOO 

0-()328 

0-2412 

0-0586 

d*2 

0f)328 

l-OOOO 

- 0-0553 

0 0655 

.r.j 

()-2412 

- 0 0553 

1-0000 

0-4248 

•'^’4 

()-0.58(> 

0-0()55 

1 

0-4248 

1-0000 


The determinant (28.83) becomes 


- A 

- 0-6328A 
0-2412 
0-0586 


— 0-6328A 

-- A 

— 0-05.53 
0-0655 


0-2412 0-0586 

— 0-0553 0-0655 

— A — 0 - 4248 .a 

— 0-4248A — A 1 
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or 

0-491,370^4 _ 0-078,803,4 P + 0-000,362,490 = 0, 

giving = 0-155,635 or 0-004,740 

with A = 0-3945 or 0-0688. 

To find the transformed variates themselves we use (28.81), For instance, with the 
root 0-3945 for (j,, we have 

cl -f 0-6328 c2 — 0-6114 - 0-1485 = 0 

0-6328 cl + c2 + 0-1402^^1 - 0-1660 == 0 

- 0-6114 cl + 0-1402 c2 + (^1 + 0-4248 d^ = 0 

- 0-1485 cl - 0-1660 c^ + 0-4248 d^ + d^- = 0 

The last equation is linearly dependent on the other three, so adds nothing. In the other 
three we solve for the ratios of c’s and d’s, finding 

cl : c2 : c^i : d^ = - 2-7772 : 2-2655 : - 2-4404 : 1. 

Thus the transformed variates are 

= - 2-7772 + 2-2655 aq 

— — 2-4404 Xs + a’4, 

where kj_ and may be chosen so that the variances of |i and if are unity, if desired. Similar 
■equations with the root 0-0688 will give us a further pair of canonical co-ordinates. Those 
we have worked out have the maximum correlation, the other pair having the minimum 
and therefore being of less interest. 

28.27. In practical cases it is of some importance to know whether an observed 
canonical correlation r^, say, is significant of real correlation. The problem has been solved 
for large samples but not completely for small samples. We shall conclude this chapter 
with a short account of the main results which have been reached. 

For large samples we shall show that, for the standard error of a canonical correlation, 

var r = i (1 - (28.97) 

a remarkable result showing that the variance is the same as for a product-moment 
coefficient. 

Denoting as usual the sample covariance by a^j we have to the first order 

E {a;j) = dij (28.98) 

To the same order, 

jLi {CLjj Ctj^i) E ^ E {^hx ^Ja) ^ 

^ ^ a p J 

If a 5 ^ the sums on the right are independent, and there are w (w — 1) such cases. When 
a = /? we have n terms such as 

~ ^ij ^kl + ^jk + <^ik ^Jl’ . . * . (28.99) 

as follows from the consideration that the characteristic function of the multivariate normal 
form is 

exp ( -|(r« P) 


(cf. 15-12, vol. I, p. 376). 
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Hence we have 


Thus 


T-r/ , n (n — \) ,«'/ , , \ 

E {a>ij djci) ~ Y — — a 

Ifh yh 

— ^kl + Tri^il^jk + <^ik*^jl)- • • 

7l 


E {daij daja) = E {a^ a, a) - g^j a„i 

1 


(28.100) 


n 


{Gu Gjic + Gjie Oji). 


Now for any canonical correlation r we have 


c“ a, 


'a/3 


1 , 

C“ d'^ a: 


d^ a^.j 


(28.101) 


(28.102) 


If now we define for the sampling deviations in c’s and ri’s corresponding to deviations 
in the a’s, 


we find 


= i: l--~ 

t, M OGiu 


2 c“ Ac^ + c“ Aa^ = 0 


oCiS 


2 cZ® Ad^ + cZ® cZ^ == 0 

ATi = 


(28.103) 


(28.104) 


Without loss of generality we may now suppose the variates canonical and hence put 


cl = 1, c2 


C'’ = . 


d’^ — 0. We then find- 


cv = 0, iZi = 1, cZ^ 

2zlci + dUii = 0, 2Ad^ + Zlaj,+i^ p+i = 0 \ 

dri = ZlfZi + ?’i Zlci + Zlai_ p+i J 

Substituting from the first two in the third of these equations we have 

At I ~ Ao/y p^.i ■— (Act-ii 4“ Acip^i vj+i)' 

Similar equations apply for any other simple root, e.g. 

Ar^ — — |r2 (Aa^a + Aap^2, p+a)- 

Squaring these equations and substituting from (28.101) we find 

nE (Ar,r~ (1 - 'rf)^ 

E (dri, Zlr.>) = 0. 

It follows that 

1 


(28.105) 

(28.106) 


var 


n 


(1 - Pf 


f)“l 


cov {r-i, r-i) = 0 


(28.107) 


to our order of approximation. 


28.28. Equation (28.107) applies to a simple non-vanishing correlation. If a canon- 
ical correlation vanishes and 'P — q, the result holds, with the qualification that sample 
values of r near the zero root must be allowed to have positive or negative values, or alter- 
natively that the distribution of r is that of absolute values of a normal variate (cf. Exercise 
28.7). If p = 2, g' > 2 a zero root is of multiplicity q at least. In this case, if it has exactly 

A.S. — ^VOL. II. ^ ^ 
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multiplicity q, nr^ is distributed as with O' — 1 degrees of freedom. For the proof of 
this result see Hotelling (19366). 

There is another rather curious difficulty in testing the significance of roots of the 
equation giving the canonical correlations, namely, that if several roots exist it is not pos- 
sible to relate them with certainty to specified parent correlations — any one might have 
arisen from any one of the parent values. This is not serious for large samples when the 
roots are distinct, since the sample values cluster closely round the parent values ; but 
for small samples or canonical correlations in the parent which are close together it presents 
a theoretical problem of a novel kind. See Hotelling (19366) and Bartlett (1941) on 
this point. 


28.29. We proceed to find the sampling distribution of canonical correlations in the 
case when the parent values are all zero and the ^-variates and g'-variates accordingly 
independent. 

Reverting to equation (28.87) in the form appropriate to samples, we have 


1 CLyk 1 = 0 . 

We write 

Ojyk 

and -f 

so that (28.108) becomes 

I {Hy + -hy\= 0- 


. (28.108) 

. (28.109) 
. (28.110) 


. (28.111) 


The significance of this device is that z and t are distributed independently in Wishart’s 
form, as we now proceed to show. 

One instructive way of looking at the problem is to consider the regression of the 
p-way vector on a g-way vector x. Corresponding to the univariate equation 


y = 6a: + e, _ . . . . . (28.112) 

where e is a residual, we have 

2/a = (28.113) 

where the 6’s are given by minimising the sum of n values 

^ (2/a - K 

namely, by 

(2/a *i) - ^a ^ iPk ^i) = 0 

or, in our notation for canonical variates, 

«at - ^a = 0, 

which yields 

(28.114) 

We may analyse the variance of y in the form — 

^ ivl) = ^ iK. 

= .... (28.115) 

corresponding to the univariate case 

and the two constituents on the right in (28.115) are independent, just as in the univariate 
case. This may be shown by a direct extension of 22 . 19 . 
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Furthermore, if we wish to find the linear function of the ^’s, say X’* y^, which has 
maximum correlation with the ic’s, we have to maximise the ratio 


This is equivalent to maximising unconditionally 

X^ (bi bji a^j — r2 = 0, 

giving, for r^, the equation — 

I b'L bi a.y - r 2 \ = 0 . 

Now in virtue of (28.114) this reduces to 


. (28.116) 


. (28.117) 


\r^a^^- a;j \ = 0 

or 

\ \ = 0, .... (28.118) 

which is equivalent to (28.108) with a slight change of notation. This must he so, for 
we arrived at both equations on essentially the same assumptions. Now we see that the 
term on the right in the determinant of (28.118) is the first item on the right of the variance 
analysis given by (28.115), and the other term in the determinant is the sum E (y^) of the 
analysis. It follows that 2 and t of (28.111) are independent, for they are the constituent 
items of the analysis. Furthermore, the s’s will be distributed as sums of squares or pro- 
ducts about the means with n — q degrees of freedom, that is in Wishart’s form ; and 
similarly the ^’s are distributed as q sums of squares or products about the origin, i.e. in 
Wishart’s form with n = q 1. 


28 . 30 . Without loss of generality we may take the parent variances to be unity ; 
the covariances are zero by hypothesis. The joint distribution of z and t is then, from 
(28.26), 


t |l {a-P~\) I 2 j4 {n-fi-p - a) exp I — 1 ^ \ndtdz 


dF 


i I 


P c 

2ip (»i+i) jjip ip-i) 77 J i"’ 
'i 1. L 


q -|- 1 




•) 


(28.119) 


In the determinant 

\ X‘^ {z t) -- t \ = 0 


put u = X^ and let the roots in u be arranged in descending order of magnitude. Consider 
the distribution for a given value of and z.^ which in particular we take to be Let us 
choose new variates from a set obeying the orthogonality conditions — 


p 

*=1 

= 0 if i ^ j 
— 1 if i — j. 


hj 






v] 


^ (f« hh) = Sij- 

k 


Make the transformation 


. (28.120) 

. (28.121) 
. (28.122) 
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Instead of the + 1) values of ^ we will take the f values of u and - 1) of the 
I’s as our new variates. We have 


I ( I = 1 = % 

fc— 1 


. (38.123) 


1*1 = -««%) I = 5,(1 -“-i) 


. (28.124) 


and have only to consider the Jacobian. This is clearly of degree \p 1) in w, for the 
Jacobian of t and z -{-t is the same as that of t and z and only t contributes factors in u 
in the former. Furthermore, every term (% — Uj), i <j is & factor of J. For consider 
— Wa and let us take as our ^-variates those for which j > i. Then to satisfy the con- 
ditions on the others, derivable from (28.120), 


we must have 


whence 


^ iiijc = 0 , 


3L 





aiia 






3|^2 _ 


0 , 


31x2 3^ 


31x2 

j> 2, 

^ i^ik ^jk %') 


12 


- {Ui - Wa). 

?xx 


(28.125) 


Thus every term (^u.^ — 5/) occurs in J, and there can be no further factors in ii because 
the power in u is \p {p — 1). 

Substituting in (28.119) we have, integrating out the 1-variates, 

dF = c n (wf (1 - Uif } n {Ui - Uj) n du . ( 28 . 1 26) 

1 = 1 

where 

c = ^ . . . ( 28 . 127 ) 

The constant k arises from terms involving n and p in the original density and from the 
Jacobian. It therefore does not involve q and may be written k {n, /)). Evaluation of 
k by direct integration is a matter of some difficulty, but we may find it indirectly 
as follows ; — 

In (28.126), if we increase q and n by 2s, the corresponding value of c is 


n 


r 


k {n ■+■ 2s, p) 


1 + 2s - i 


r 



. (28.128) 


The only other term in (28.126) which is affected is that in 77 (u) and, with the original 
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c of (28.127), the integral of the distribution so modified would give us the moment of 
order 6* of 77 {u), namely of | i |. This may be found in the manner of 28.15 to be 


77 


(see Exercise 28.11). It follows that 

k {n 2s, p) 


r ^ S' H" 1 i ^ T’ ^ ^ ^ ^ ^ 


(28.129) 


25 


k {n, p) 


n 




whence 




h {n, p) ^ n r ip). . 


(28.130) 


(28.131) 


It remains to evaluate/ (p). To do so we make the substitution in (28.126) 

2^7,. 


u.i 


n 


hitting n tend to infinity. Our distribution becomes 

>-i) 

. exp (— Evy n [v^ — vy n dv. 






(28.132) 


Tliis may be reduced by successive substitutions of the type 

Wi -- w^, Vj = Wj + Vj_, j > 1, 

and choosing q at each stage so that the term in 77 (v) vanishes (as we may, since the result 
is independent of q). On integration for Vi, then repeating the process, and so on, we find 

f{pj_^ nr{p^i-i) 
nr(v + 2 




1. 


Using the relation 
'wi' have 


F {x) Fix + ^) = 2-2^+^ (2a;), 


fip) 


TV 


Ip 


nr 




— t 


Thus our distribution is finally 

dF =- c 77 [u^ '(1 - uf (*‘-*>-3-2) } 77 (Wi - 11 du, 

where 


F 


P 


n 


( n — i\ 

/ 




(28.133) 


(28.134) 


(28.135) 


a remarkable form obtained in the general case by Fisher (19396), P. L. Hsu (19396), and 
Boy (19396). 
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We have supposed throughout that q> p. In the contrary case we reverse the roles 
of q and^ and hence merely have to interchange p and q in (28.134) and (28.135). 

28.31. Let us consider some special cases. When g; = 1 the distribution becomes 

'n —■ 1\ 


Ti 


dF = 


r 


( 


n — f 


r I 


U\ (33" 2) (1 _ (w-35-3) 


(28.136) 


confirming the distribution of equation (28.40) leading to Hotelling’s distribution ; for 
the canonical correlation is then the multiple correlation between the g-variate and the 
p-variates ; and as the former is measured from its mean there is one fewer degree of 
freedom, i.e. n is replaced by — 1. 

When <7 = 2 we have 


dF: 




^^ n-p-2 




)'■{ 




r 


'p-i 


{Ui ^33-3) I (1 —Ui) (1 — ^io) 


X {Ui — U2) duxdu^. . (28.137) 


Writing 

we find 


(1 — Ui) (1 — U 2 ) = V, 

^ 1 +^ 2 = W, 


dF 


r{n- 2) 


(V — 1 -|- W)^ (3>-3) (n-j3~4) 


4jr {n — p — 2) r [p — \) 

For given v the limits of w are 1 ~ v and 2 (1 — ■\/v), and integrating for ?<’ we find 

Fin -2) 2 


(28.138) 


dF 

or, for s/v, 
dF 


4F {n p — 2) r (p ■— i) ' p — 1 
1 


(1 — dv 


B {n — p — 2, p) 
a result due to Wilks — cf. equation (28.62) 


(1 — d'\/v, 


(28.139) 


28.32. The distribution of the w’s does not immediately provide a test of significance 
of the canonical correlations, except when there is only one of them. The criterion 

V :^n{l -u) (28.140) 

is sometimes useful in the general case for testing simultaneously the departure of the 
w’s from zero. Cf. Exercises 28.11 and 28.12. 


NOTES AND REFERENCES 

Among earher papers in which various aspects of the multivariate problem began to 
be studied, reference may be made to Karl Pearson (19266) on the “ coefficient of racial 
hkeness ” and Ragnar Frisch (1929), who independently arrived at the dispersion matrix 
and proposed to call its determinant in standard measure the “ scatterance ”. Reference 
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to tlie papers by Wishart (1928), Wishart and Bartlett (1933c) and Hotelling (1931) on the 
gtineralised product-moment distribution and the generalised “ Student ” ratio has been 
made in the text. 

In moi'c recent literature three lines of development are discernible : — 

{<t) American writers have developed the theory of canonical correlation and multiple 
analysis tnainly on algebraic and analytical lines. See Hotelling (1933, 19366), Wilks 
( I l)32c, 1934, 19356, 1935c, 1936, 1943), Girshik (1939), and Madow (1938). 

(6) English schools have investigated the theory of discriminant functions and devel- 
oped tlie sampling theory of canonical roots. See R. A. Fisher (1936a, 6, 1938c, 19396, 
19407), P. L. Hsu (1938c, 19396, 1941a, c, d), and for illustrative material Martin (1936), 
Bu.rnard (1935), Fairfield Smith (1936) and Wallace and Travers (1938). See also Bartlett 
( 19346, 1938c, 19396, c, 1941), E. S. Pearson and Wilks (19336), Welch (19396), Lawley 
( 1938) and Bishop (1939). Simaika (1941) has proved that tests based on Hotelling’s T 
a, lid the multiple correlation coefficient are uniformly most powerful in the class depending 
oil a single parameter. 

(c) ddie Indian school, whose contribution has not been referred to in this chapter, 
ha>.s (h'^veloped some interesting work based on what is known as the D^-statistic. See 
Mnhalanobis (1930, 1936a), Mahalanobis, Bose and Roy (19366), R. C. Bose (1936a), R. C. 
Hokc' and Roy (1938c), and later papers in Sankhyd. If, with two samples from p-variate 
p(>l>uIations, d.^ is the difference of sample means for the fith variate, the studentised 
/> “-statistic is 

^ J 

P 

vvhcM'e refers to the reciprocal of the sample dispersion matrix. Bose and Roy have 
shown that in normal samples this has the same distribution as one of Fisher’s forms for 
t multiple correlation coefficient. The corresponding parameter for the population 

■=- di dj 

P 

is known as Mahalanobis’s generalised distance. 


EXERCISES 

28.1. In a four- variate normal distribution show that the correlation between the 
<!ovnrian<!es ai^ and is 

Pia Rzi + Pi4 Pz3 

1(1' +7f2) (1 + Pi,) 

(Wishart, 1928.) 


28.2. For a pair of normal variates with correlation p, show that, defining v by 

n ai 2 

^ UiOa (1 - p^)’ 

we have', for the frequency function of v 

VS2‘»-‘ J"(^^ 
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for t; !> 0 and a similar expression with — 'o for v inside curly brackets if < 0 . Here 
K is the Bessel function of second kind with imaginary argument. 

(Wishart and Bartlett, 1933c. See also K. Pearson and others, 1929.) 

28.3. Show that if Ic sets of variates h — 1 ... Jc; i, j = 1 . . . p are each 
distributed in Wishart’s form, with sample numbers then the variates 

k 

are also distributed in Wishart’s form with n — (This follows readily from the 

h=l 

characteristic function. It is a generalisation of the additive properties of %“.) 


28.4. If a sample of n is chosen from a j)-variate normal population, the variates 
being grouped into h classes x^, » - • - 5 ^ 351 + . 

. . . Xrp, consider the function — 


where r - = 1 and is zero if the variates belong to different classes and equals the cor- 

%1m f'J 

relation if they belong to the same class. 

By considering the function 

A = Tfi” 

show that 



(Wilks, 1935&. The distribution provides a test of the independence of k sets of normal variates.) 


28.5. As a particular case of the last exercise, show that if a single variate is 
independent of a second set . . . x^^, then — 



and hence find the distribution of the multiple correlation coefficient when the parent 
coefficient is zero. 

(Wilks, 19356.) 


28.6. Show algebraically that Hotelling’s T is invariant under linear transformations 
of the p variates. 

28.7. If the determinantal equation (28.83) with p — q has a double root equal to 
zero, show that for large samples the value of r corresponding to the canonical correlation 
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is given by omitting all terms in the determinant when expanded, except those in and 
A®. Noting that the latter is a perfect square, show that r is the ratio of a polynomial 
in the sample dispersions to a non-vanishing function regular in the neighbourhood of 
zero. Hence that (28.107) holds when p = 0. 

(Hotelling, 19366.) 


28.8. In the notation of 28.23, if 


A = 


a 


'a/3 h 


£ = I (T. 


xj 


c 


0 

: 


D = 


^cd 










^icc 



show that the vector correlation coeffix^ient K defined by 

an<l the square of the vector alienation coefficient Z defined by 

D 


Z 


AB 


are invariant under linear transformations of the variate. Also that 

= i Pi ^2 • • • Pp 

Z = (1 -pf)(l -4) . . . (1 -p|) 

where the p’s are canonical correlations. 

(Hotelling, 19366.) 

28-9. In the notation of the previous exercise, k and z being the sample values of 
K and Z, show that if the population canonical correlations are all distinct, 

f(i -p?)M 


varfc = ^ y 

n 


Pi 


var z 


n 


V 

■i.=l 


cov (k, z) 

In particular, when f = 2, 


2 

n 


- pi). 

i=l 


var i = i { (1 - -Z{\+K^)) 

n 

var^ = — (1 - Z + «>) 

n 


cov {k, z) 


n 


KZ{\+Z - K^). 


(Hotelling, 19366.) 
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28.10. In the previous exercise, with ^ g = 2, show that, in standard measure, 

h = ^13 ^24 " ^14 ^23 

~ {(1 -'■y (1 - ('ll) f 

and hence derive a test of significance of the “ tetrad difference ” J'as- 

(Hotelling, 19366.) 


28.11. In the notation of Exercise 28.9, show that 


E{t^) = n 


1 ^ ^ j p + g + 2/$ - i 


n - % 


(Girshik, 1939.) 


28.12. Find the characteristic function of - logs:, where z is defined as in the 
previous exercise, and hence show that - ti log 2 or, to a better approximation, 
-{n-l-|(p+g -f l)}log z tends to be distributed as with pg degrees of freedom 
when n is large. 

(Bartlett, 1938c.) 



CHAPTER 29 


TIME-SERIES— (1) 

29.1. A time-series, as its name indicates, is a series of values assumed by a variable 
at different points of time. We shall consider only cases where the variable is univariate 
and shall denote its value at time t by Ui. The study of such series forms an important 
branch of statistics because the majority of types of time-variation encountered in practice 
are not of the regular functional type in which can be represented exactly by a mathe- 
matical function of t, but present in some degree those irregularities of a random character 
which can only be discussed in terms of probability. One of our main problems, in fact, 
will be to isolate systematic from casual effects in the series so as to be able to study 
them separately. 

29.2. In general it is possible to observe a time-variable at any instant, and thus 
the temporal intervals between successive members of the series need not be the same. 
Practice and theory alike, however, usually require the observations to occur at regular 
intervals, and in the sequel we shall assume, unless the contrary is specifically stated, that 
the interval from each observation to the next is the same throughout the series. As 
a matter of convenience we may take this interval as our time-unit and write the series as 

Ui, Uo, Us, . . . Ui, .... . . . • (29.1) 

where t must be an integer. Where a series extends backwards and forwards from some 
given point which we wish to regard as origin we may write it as 

. . . . . . u_<i, u„, Ui, Us, . . . Ut, . ■ . . . (29.2) 

In this chapter and the next we shall study tlie way in which u^ varies with t, such variation 
being in general of the stochastic type, that is to say, involving random variables. 

Some Examples of Thm-series 

29.3. Tables 29.1 to 29.5 provide some examples of the kind of variation encountered 
in practice. Table 29.1 (illustrated in Fig. 29.1) gives the annual yields per acre of barley 
in England and Wales from 1884 to 1939. Table 29.2 (Fig. 29.2) shows the human popula- 
tion of England and Wales at ten-yearly intervals from 1811 to 1931. Table 29.3 (Fig. 29.3) 
gives the sheep population of England and Wales for each year from 1867 to 1939. 
Table 29.4 (Fig. 29.4) gives the annual rainfall in London for each year from 1813 to 1912. 
Table 29.5 (Fig. 29.5) gives the average egg-production per lajdng hen in the U.S.A. for 
each month of the years 1938 to 1940. 
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TABLE 29.1 

Annual Yields per Acre of Barley in England and Wales from 1884 to 1939. 

(Data from the Agricultural Statistics.) 

Yield per Yield per Yield per Vf^nr Yield per 

acre (cwts.). acre (cwts.). acre (cwts.). ' acre (cwts.) 



1890 1900 1910 1920 1930 

Years. ' 

Fig. 29.1. — Grraph of the Data of Table 29.1 (Barley Yields per Acre). 
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TABLE 29.2 

Population of England and Wales at Ten-Yearly Intervals from 1811 to 1931. 

(Data from the Registrar-General’s Statistical Review, 1933, Part II.) 



"" - 

Year. 

Population 

(millions). 

1811 

10-16 

21 

12-00 

31 

13-90 

41 

15-91 

51 

17-93 

61 

20-07 

71 

22-71 

81 

25-97 

91 

29-00 

1901 

32-53 

11 

36-07 

21 

37-89 

31 

39-95 



1811 1831 1851 1371 1891 1911 1931 


Years. 

l^iG. 29.2. — -GTapli of tlio Data of Table 29.2 (Population of England and Wales). 
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TABLE 29.3 

8heep Population of England and Wales for each Year from 1867 to 1939. 

(Data from the Agricultural Statistics.) 


Year. 

Population 

Year. 

Population 

Year. 

Population 

Year. 

Population 

(10,000). 

(10,000). 

(10,000). 

(10,000). 

1867 

2203 

1886 

1892 

1905 

1823 

1924 

1484 

68 

2360 

87 

1919 

06 

1843 

25 

1597 

69 

2254 

88 

1853 

07 

1880 

26 

1686 

70 

2165 

89 

1868 

08 

1968 

27 

1707 

71 

2024 

90 

1991 

09 

2029 

28 

1640 

72 

2078 

91 

2111 

10 

1996 

29 

1611 

73 

2214 

92 

2119 

11 

1933 

30 

1632 

74 

2292 

93 

1991 

12 

1805 

31 

1775 

75 

2207 

94 

1859 

13 

1713 

32 

1850 

76 

2119 

95 

1866 

14 

1726 

33 

1809 

77 

2119 

96 

1924 

15 

1752 

34 

1653 

78 

2137 

97 

1892 

16 

1795 

35 

1648 

79 

2132 

98 

1916 

17 

1717 

36 

1665 

80 

1955 

99 

1968 

18 

1648 

37 

1627 

81 

1785 

1900 

I 1928 

19 

1512 

38 

1791 

82 

1747 

01 

1898 

20 

1338 

39 

1797 

83 

1818 

02 

1850 

21 

1383 



84 

1909 

03 

1841 

22 

1344 



85 

1958 

04 

1824 

23 

1384 





Fig. 29.3. — G-raph of the Data of Table 29.3 (Sheep Population). 
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TABLE 29.4 


Total Annual Rainfall at London in Iriches, for each Year from 1813 to 1912. 
(Data from D. Bnmt, Phil. Trans. A, 225, 247, 1926.) 


Year. 

Rainfall 

(inches). 

Year. 

Rainfall 

(inches). 

Year. 

Rainfall 

(inches). 

Year. 

Rainfall 

(inches). 

1813 

23-56 

1838 

21-63 

1863 

21-59 

1888 

27-74 

14 

26-07 

39 

27-49 

64 

16-93 

89 

23-85 

16 

21-86 

40 

19-43 

65 

29-48 

90 

21-23 

16 

31-24 

41 

31-13 

66 

31-60 

91 

28-15 

17 

23-65 

42 

23-09 

67 

26-25 

92 

22-61 

18 

23-88 

43 

25-85 

68 

23-40 

93 

19-80 

19 

26-41 

44 

22-65 

69 

25-42 

94 

27-94 

20 

22-67 

45 

22-75 

70 

21-32 

96 

21-47 

21 

31-69 

46 

26-36 

71 

25-02 

96 

23-52 

22 

23-86 

47 

17-70 

72 

33-86 

97 

22-86 

23 

24-11 

48 

29-81 

73 

22-67 

98 

17-69 

24 

32-43 

49 

22-93 

74 

18-82 

99 

22-54 

25 

23-26 

50 

19-22 

75 

28-44 

1900 

23-28 

26 

22-57 

51 

20-63 

76 

26-16 

01 

22-17 

27 

23-00 

52 

35-34 

77 

28-17 

02 

20-84 

28 

27-88 

53 

25-89 

78 

34-08 

03 

38-10 

29 

25-32 

54 

18-65 

79 

33-82 

04 

20-65 

30 

25-08 

55 

23-06 

80 

30-28 

05 

22-97 

31 

27-76 

56 

22-21 

81 

27-92 

06 

24-26 

32 

19-82 

57 

22-18 

82 

27-14 

07 

23-01 

33 

24-78 

58 

18-77 

83 

24-40 

08 

23-67 

34 

20-12 

59 

28-21 

84 

20-35 

09 

26-75 

35 

24-34 

60 

32-24 

85 

26-64 

10 

26-36 

36 

27-42 

61 

22-27 

86 

27-01 

11 

24-79 

37 

19-44 

62 

27-57 

87 

19-21 

12 

27-88 



Years. 

Fig. 29.4. — Graph of the Last 50 Terms of the Data of Table 29.4 (Kainfall). 
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TABLE 29.5 

Average, Number of Eggs per Laying Hen in the U.S.A. for each Month of the Years 1938—1940. 

(Data from Report of the Bureau of Agricultural Economics, U.S. Dept, of Agriculture, on the 

Poultry and Egg Sitiuxtion, March, 1941.) 


Year. 

Jan. 

Feb. 

Mar. 

Apr. 

May. 

June. 

July. 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

1938 

7*9 

9-9 

15-4 

17-6 

17-3 

14-9 

13-6 

11-8 

9-4 

7-6 

5-9 

6-4 

1939 

8-0 

9-7 

14-9 

170 

17-0 

14-6 

13*2 

11-7 

9-3 

7-4 

6-0 

6-8 

1940 

7-2 

9-0 

14-4 

16-5 

17-0 

14-8 

13-4 

11-8 

9-7 

7-9 

6-2 

6-8 



!Fig. 29.5. — Graph of the Data of Table 29.5 (Egg Production). 


These series are fairly typical of the kind of material with which our theory has to 
deal. The data of Table 29.1 (barley yields) present a very irregular fluctuation, and so 
far as the eye can see (which is not a decisive test) there is no systematic oscillation and no 
regular movement in mean yields over the period. By contrast, Table 29.2 (human popula- 
tion) shows a relatively smooth movement without apparent oscillation. Table 29.3 (sheep 
population) combines a general decline in numbers with marked oscillatory effects which, 
though not perfectly regular, appear to be systematic to some extent. Tables 29.4 and 
29.6 exhibit an oscillatory effect which is definitely seasonal for the latter and much less 
regular for the former, neither indicating a variation, in the periods covered, of the average 
values about which the series oscillate. 

29.4. It must not be overlooked that our method of determining the values of the 
series at fixed equal intervals of time may suppress evidence of oscillatory movements 
which have a period equal to those intervals or to some sub-multiple of them. Suppose, 
for instance, that there was a systematic oscillation in the English population expressible 
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by a harmonic component with period of exactly 10 years, or exactly 5 years, or exactly 
3 1- years. Clearly, by observing the series at 10-yearly intervals we should never find any 
evidence of this effect, for it would contribute exactly the same amount to each observation, 
without oscillation. In the population case, of course, we have collateral evidence to 
indicate that no such oscillation exists, but where nothing is known of the series otherwise 
we can never exclude the possibility of a period exactly equivalent to our time-interval. 
Sometimes, in fact, we know that it is there, and choose our interval so as to exclude the 
oscillation from consideration. For instance, in our sheep population we know that there 
is a seasonal effect within the year, which is not brought out in Table 29.2 because the 
sheep census is taken on June 4th each year ; and again, in the rainfall data of Table 29.4 
we have taken as representing the year the whole rainfall within the year, knowing quite 
well that rainfall is seasonal to some extent, even in London. 

29.5. A general survey of these and similar series suggests that the typical time- 
series may be regarded as composed of three parts : — 

(а) a trend, or long-term movement ; 

(б) an oscillation about the trend of greater or less regularity ; 

(c) a “ random ”, “ irregular ” or “ unsystematic ” component. 

It is customary to regard the series as composed of these elements superposed one on 
another ; that is to say, we consider the movement of the series as the sum of three dif- 
ferent components which may be generated by different causal systems. Particular series, 
of course, need not exhibit them all. That of Table 29.2 (human population) seems 
to be almost entirely trend, with perhaps a small unsystematic residual, whereas that of 
Table 29.5 (egg production) appears to be entirely oscillatory, and very regularly so. 
But some series at least exhibit all three. 

29.6. The primary problem of time-series analysis from the statistical viewpoint is 
to isolate the three factors for individual study, and in this chapter and the next we shall 
be mainly concerned with various methods of carrying out the necessary analysis. Before 
proceeding, however, we must look a little more closely into the reality of the effects which 
we are investigating and the basis on which we assume that the analysis is legitimate. 

29.7. Perhaps the easiest component to understand and to remove from the series 
is the smfi 07 uil ejf&ct. This is a fluctuatioTi imposed on the series by a cyclic phenomenon 
external to the main body of (jausal influences at work upon it. The oscillation in egg- 
production in Table 29.5, for instance, reflects the rhythm in the reproductive process 
which is found among birds in virtue, ultimately, of the fact that the earth goes round 
the sun once a year. Strictly speaking, we ought to confine the word “ seasonal ” to those 
effects which are annual in period ; but where no confusion is likely to arise we can apply 
the same word and the same ideas to any phenomenon generated by strictly periodic natural 
processes, such as “ spring ” and “ neap ” variation in tides or daily variation in tempera- 
ture. We must, however, be careful about e.xtending the notion of seasonality to phenomena 
which are not demonstrated beyond reasonable doubt to depend on strictly periodic stimuli. 
For instance, it would be going too far, in the present state of our knowledge, to speak of 
sunspot variation as seasonal in this sense, and much too far to speak of seasonality in 
crop-yields as determined by sunspots, even if the relation between the two were estab- 
lished. We shall return to this point below when defining what we mean by a “ cycle ” 
as distinct from an “ oscillation ”. 

A.S. — VOL. n. ® ® 
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29 . 8 . As we noted in 29 . 4 , the seasonal effect may already he removed from the 
series by the way in which the data are specified. Where we ourselves have any choice 
in the determination of the data, we may eliminate seasonality in the same way, namely, 
by selecting for measurement of the series a point of time which is fixed in relation to the 
year, such as June 4th for the agricultural returns of England and Wales, or by averaging 
over the year, or (what is much the same thing) by cumulating the series over the year, 
as for instance with rainfall data. 

29 . 9 . The concept of trend is more difficult to define. Generally, one thinks of it 
as a smooth broad motion of the system over a long term of years, but “ long ” in this con- 
nection is a relative term, and what is long for one purpose may be short for another. For 
example, if we were examining rainfall records over a hundred years a slow rise from the 
beginning of the period to the end would be regarded as a trend ; but if we possessed records 
for two thousand years (and the rings in some of the giant redwood trees give an index of 
climatic conditions for periods of this order) the rise over a particular century might appear 
as part of a slow oscillatory movement, so that any inference from the “ trend ” in a par- 
ticular century to the effect that the weather was likely to continue becoming wetter and 
wetter might be quite false. What inference we should make in practice would depend 
on what we were trying to do. If we were engineers designing a water-supply system and 
wished to provide against droughts of reasonable extent, we might perhaps assume that the 
trend would last as long as our works and proceed accordingly ; but if we were attempting 
to study climatic changes over the face of the earth for geological periods of time we should 
accept the continuance of the trend with the greatest reserve or, more probably, should 
reject it on collateral grounds. 

29 . 10 . However long a series may be, we can never be certain, and often not even 
reasonably sure, that a trend in it is not part of a slow oscillation, except of course when 
the series has terminated (as might, for instance, be the case if we were considering the 
lengths of reigns of the Roman Emperors). In speaking of a trend, therefore, we must 
bear in mind the length of the series to which our statement refers. Perhaps it would be 
more accurate to speak of slow or quick movements rather than of trend and oscillation, 
but even so the distinction between the two would remain a matter of subjective judgment 
to some extent. 

29 . 11 . When seasonal variation and trend have been removed from the data we 
are left with a series which will present, in general, fluctuations of a more or less regular 
kind. Fig. 29.1 represents the kind of series we obtain, since it has no components of 
trend or seasonality. The question then arises, is this residual series systematic in the 
sense that its values can be represented as a function of the time ? Or, on the other hand, 
are the values random in the sense that they could occur, in the observed order, by random 
sampling from a homogeneous population ? Or again, is there some possibility intermediate 
between complete functional variation and complete randomness ? The search for syste- 
matic effects in residual fluctuation gives rise to several techniques of analysis, the object, 
of which is to detect whether any part of the series is subject to law, and therefore predict- 
able, and whether any part is purely haphazard. The former part we shall call systematic, 
and it will be referred to as an “ oscillation ” (not a “ cycle ”, which is a very special case 
of an oscillation, as we shall see later). The remainder of the series we shall call the unsys- 
tematic component, and refer to its movements as “ random ”. When a series is a mixture 
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of oscillation and random movement it will not cause any inconvenience to refer to the 
up-and-down movement generally as fluctuation before we have analysed it into its con- 
stituents ; that is to say, we may speak of fluctuation without prejudice to the possibility 
of detecting oscillatory movements in it. 

In this chapter we study trend and random residuals. In the next chapter we shall 
deal with oscillatory and cyclical components. 

29.12. The logician or the economist who wants to be difficult can always maintain 
that, although any series can be separated into our three specified components as a matter 
of mathematical or statistical analysis, the results throw little or no fight on the causal 
influences at work to produce the series. To such a critic we have to concede, I think, 
that in carrying out the analysis we have at the back of our minds the strong possibility 
that the three elements are due to independent causal systems. If he refuses to accept 
this view — and some economists do — we can only invite him to produce a better statistical 
method. 

Possibly the reader will feel, on reaching the end of Chapter 30, that we have not been 
wasting our time, and that our methods do throw light on the way in which time-series 
behave. If not, he should consult some of the references and see whether he finds them 
statistically more satisfying. 

Determination of Trend 

29.13. It is an essential part of the concept of trend that the movement over fairly 
long periods is smooth. This means that we can represent the trend component, at least 
locally, by a polynomial in the time element t. Thus, given the series we may, in the 
first instance, seek for some polynomial 

-- Uo -f- <t’i i h + • • • -1- Up (29.3) 

which will give an account of the trend movement. By taking p great enough we can, of 
course, obtain as close a representation as we like to a finite series ; and how large we 
take 'p is a matter for decision in partieuhir ca,seH. 

If the polynomial is fitted to the whole series by least squares, it evidently gives the 
curvilinear regression lino of on the variable t. fidiis method would then lead to the 
fitting of regressions in the manner of Chapter 22, and we need not repeat here what has 
been said on the subject in that chapter. In Example 22.7 we did, in fact, fit a quartic 
to the population data of Table 29.2 and found a good fit. 

29.14. It is, however, clear that to obtain a satisfactory trend-curve for data such 
as that of Table 29.3 (sheep population), we should have to take a polynomial of rather 
high order. This may appear somewhat artificial and in any case the coefficients of such 
a polynomial, being based on high-order moments, would be very unstable from the sampling 
viewpoint. A more practical objection, though by no means an unimportant one, is that 
if we add another term to the series, as for example if we are keeping an annual series up 
to date from year to year, the work of fitting has to be done afresh each time. Moreover, 
the trend-line may be affected throughout its length. When, therefore, the series has no 
very obvious trend such as that of Table 29.2 it is more convenient to use the simpler 
methods described below. 


/ 
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Moving Averages 

29.15. An alternative to finding a polynomial which will represent the whole series 
is to determine a polynomial which will represent a part o£ it, and to use different poly- 
nomials for different parts. The simplest method, and one which forms the basis of the 
majority of methods of trend fitting, is to take the first m terms (m being chosen at will), 
fit a polynomial of order not greater than m — 1, to them, and use that polynomial to 
determine the value in the middle of its range ; then to repeat the operation with the m 
terms from the second to the (w - 1 - l)th, and so on, moving on one term at each stage. 
Unless other considerations require it, we take m to be odd, so that the middle point of 
the range corresponds to a value which is actually observed. Otherwise the middle point 
fall a half-way between two observed values, or we have to use some value of the fitted 
polynomial other than the middle point, which results in a loss of useful symmetry. 

29.16. Suppose, then, that the number of terms is chosen to be odd and is denoted, 

with a alig ht change of notation, by 2m + 1. Without loss of generality we may denote 
the terms by . . . Uq, . . . If we choose to fit to them^ a poly- 

nomial of the ^th order (29.3) we may, in the usual way, determine the coefiicients by 
least squares, i.e. solve the equations 

_ m 

^ y {ut - ao - ... - = 0, j = 0 ... p . . (29.4) 

Oa^ 

J t— —m 

which will give us equations typified by 

S {P U{) — tto ^ {P) — aiU {P'^^) — ... {P'^^) = 0 . . . (29.5) 

Now the sums (P) are functions of m only. Thus, if we solve (29.5) for Uq we shall find 
an equation of the form 

do = Co -f" Cl -t- C 2 1 ) ~t~ • • • "i“ ^2m+l ‘ ' ( 2 ^-^) 

where the c’s depend on m and p, but not on the ^t’s. 

Now Mo assumes the value do at « = 0 and hence this value, as given by (29.6), is the 
value we require for the polynomial. As we see, this is equivalent to a weighted average 
of the observed values, the weights being independent of which part of the series is taken. 
Thus our process of fitting a trend-line consists of determining the constants c (which 
depend on m and p and therefore give us a twofold element of choice) and then calculating, 
for each consecutive set of ( 2 m + 1 ) terms in the series, a value given by (29.6). If the 
terms are . . . Ms^+a:* calculated value will correspond to t = m + x. There will 
be no values corresponding to the m terms at the beginning and the m terms at the end. 


Example 29.1 

Suppose we have a series and wish to fit a curve which best approximates to sets of 
seven points ; and suppose we regard a cubic as providing a satisfactory approximation. 
What are the weights of the moving average ? 

We have m = 3 and = 3, and our polynomial is 

U^ = do -f- di t -|- d 2 P "i" U 3 P. 
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Taking our origin at i = 0, we find, for equations (29.5), in virtue of the fact that S («*) = 0 
for odd k, 

S (u) = 7(Xo -j- 28(Z2 

(tu) = 280i ~1“ 19 60-3 

2 (Pu) = 28ao + 196«a 

2 (t^u) = 196ai + 158803 

giving, for Oq, 

tto = (^) — ^ } 

= — ■( — 2'it_2 -]- 3'it 2 ~t~ 1 7wo “f" “h 3*1^2 )■. 

JLtl. 

We may write this conveniently as 

2l [- 

or, when symmetrical formulae are used, as in the present case, by 

[ 2, 3, 6, 7 ... ], 

denoting the middle term by heavy type. 

To take a simple illustration. Suppose the series is given by the following values ; — 

«:1234:5 6 7 8 9 10 

ur. 0 1 8 27 ()4 125 216 343 512 729 

We have, for the trend value at i = 4, 

a, = ? - { ( -2 X 0) + (3 X 1) +(6 X 8) + (7 X 27) + (6 X 64) + (3 X 125) -(2x216)}= i- {567 } 

2 1 

= 27. 

Similarly, at i = 6 we find 

ao -- ;cv{(- 2x8) -!- (3 X 27) + . . . -(2 X 512) } 

Ji 1 

-- 125. 

In both cases the trend-value is equal to the actual value of the series, and this obviously 
must be so when we note that we are fitting a cubic to the series 

'o, = (i - 1)«. 

It will be observed that in this example we should have obtained the same value for 
Oq if we fitted quadratics instead of cubica ; and generally the case p odd includes the 
case of the next lowest (even) value of p, so that we need not give separate formulae for 
even p. 

29 .17. Writing Co [ib] for the value of a„ calculated in the above manner for an average 
of k successive terms, we find the following formulae up to p = 5. The reader may care 
to verify them for himself as an exercise. 
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Quadratic and Cubic 


[6] 

1 

[- 3, 12, 

17, 

. . .] 


[7] 

1 

21 

[- 2, 3. 

6, 7, 

■ • .] , 


[9] 

1 

m 

[- 21, 14, 39, 

54, 59, ... ] 


[11] 

1 

429 

[- 36, 9, 

44, 

69, 84, 89, ... ] 


[13] 

1 

143 

[- 11, 0, 

9, 16, 21, 24, 25, . . .] 


[15] 

1 

1105 

[- 78, - 

13, 

42, 87, 122, 147, 162, 167, 

. . .] 

[17] 

1 

323 

[- 21. - 

■ 6, 7 

, 18, 27, 34, 39, 42, 43, . . , 

•] 

[19] 

1 

2261 

[- 136. ■ 

- 51, 

, 24, 89, 144, 189, 224, 249, 

264, 269, 

[21] 

1 

3069 

[-171, - 

-76, 

9, 84, 149, 204, 249, 284, 309, 

324, 329, 





Quartic and Quintic 


1 

[5. - 

- 30, 76, 

131, 

. . .] 


1 

429 

[16, 

- 65, 30, 

135, 

, 179, . . . ] 


1 

4^ 

[18, 

-45, - 

10, 60, 120, 143, . . .] 


1 

2431 

[110, 

- 198, 

- 135, 110, 390, 600, 677, . . .] 



[2145, - 2860, - 2937, - 165, 3755, 7500, 10,125, 11,063, . . .] 

^ [195, - 195, - 260, - 117, 135, 415, 660, 825, 883, . . ,] 

[340, - 255, - 420, - 290, 18,405, 790, 1110, 1320, 1393, . . .] 

260^015 - 6460, - 13,005, - 11,220, - 3940, 6378, 17,655, 

28,190, 36,660, 42,120, 44,003, . . . ] 


(29.7) 


(29.8) 


29.18, Several methods have been proposed to simplify the arithmetic of fitting 
a trend-line by moving averages, the large numbers in some of the expressions in (29.7) 
and (29.8) involving considerable labour in straightforward application. The simplest, 
perhaps, is that of iterated averages. 

Suppose we take an average of sets of four with equal weights — a very simple process 
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— and then another average of the same kind of that average. If the primary series is 
U(, the result of the first operation will be to give a series 

1 

= - {Ux + Wa + -Ita + Ui) 

2 + ■M '3 + '**4 + etc., 

and that of the second operation to give 

i (^1 4- + ^3 + Vi) 

= 4* ^^2 4" dwa 4” ^Vi.i "h d'i /-5 “b 4" W 7 ]. . . (29.9) 

We may write this symbolically as 

|i[l, 1, 1, 1]V =-i[l. 2, 3, 4 . . . . . (29.10) 

or, reserving the symbol ~ [k] for a simple arithmetic mean of k terms, as 

/c 

2 , 3 , 4 . . .] (29.11) 

lb lb 

Now compare the weights of the average derived in .Example 29.1 for fitting a cubic 
to seven points. Reduced to unit divisors we have for the weights of the latter 

- 0 0952, 0*1429, 0*2857, 0-3333 . . . 
and for the weights of (29.9) 

0*0()2r), 0*1250, 0*1875, 0*2500 . . . 

The two are not identical, but they follow the same sort of course and it might be possible 
to regard the latter as an ai)proxiraation to the former. (We shall derive better approxi- 
mations {)resently, but tliis will serve lor |)urposes of illustration.) Now the iterated 
summation resulting in (29.9) is much easier to carry out than the single weighted averaging 
process of Examjde 29.1. ( Jcncn-ally, if we cian find averages with simple integral weights, 

preferably unity, which will, in conjimction, give a])i)roximations to the more complicated 
weights of a single average, it is usinilly easicu- to use the iteration {wocess. 


29.19. -In the notation of finite differeiujes, write 


]{/ 'if,^ “ ~ (1 ■ f- d) ill 

dill W'M-J — 11 1 


We have, for the second 


Writing 


we find, symbolically, 


“ central ” difference 54/,^, 

S^V'i = (% 4 .] Ilf) {ll't 11't—l) 

= (/ij - 2 + 7<;-') 

JEl — exp (2i^) . 

6'^ ==E ~2 


— exp {2i(f>) -t- exp ( — 2i^) — 2 
= — ■ 4 sin^ . 


. ( 29 . 12 ) 
. ( 29 . 13 ) 
. ( 29 . 14 ) 


. ( 29 . 15 ) 
. ( 29 . 16 ) 


. (29.17) 
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Then 




x w= ii f-®'"*) 

t=—7n 

= /l -f 2 ^ (cos 2j<f)) I w, 


since the terms in sin 2j<^ vanish, 




Thus 


_ sin {2m + 1)^ 
sin ^ 


u«. 


(29.18) 


1 PM ... ^ 

T L^J ^0 — -jji : Uo 

1C 1c Sin <f> 


= \\h- sm* 4. + ^ 


— - sin^ <f> 


Uo -1- 


3! ' ' 5! 

22 3 ! ® ^ - 0 Uo + . . . 




Wo 


(29.19) 


This interesting formula gives the arithmetic average in terms of the middle term and 
its central differences. • “ 

vanish approximately represented by a cubic, so that fourth differences 


1 p,T Jc^ — I 

^ M Wo — Wo -f — d^Uo 


. (29.20) 


Similarly, for two iterated 

averages we nave, to the same order. 


[*i] [* 2 ] Wo = Wo + ™ (kf + _ 2) d‘^Uo 


. (29.21) 


fOT^“ emduaW^'^i"^^^ formulae in very general use by actuaries 

tor graduating a series, a process which is very similar to that of fitting a trend-line. 

Example 29.2. Spencer’s 15-point Formula 

Consider three successive averages with equal weights 


^5 W [4] [5] Wo = Wo + -I {42 - 1 + 42 - 1 -p 5: 

9 


1 } (52wo 


Wo -j- — (5^ Wo. 

4 


AVe then have, to third differences 


Uc 


1 

80 


[4]^ [6] 




'2/'n 


Substituting for a* the formula [1. - 2, 1], as given by (29.16), we find 

1 


Wn 


320 


[4]2[5] [- 9, 22, - 9]. 


affecting the order of the approximation we may add factors in or higher 
central differences, and can simplify the numerical coefficients to some extent. Let us 
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add to the factor [- 9, 22, - 9] a term — 3<3^ = [- 3, 12, - 18, 12, - 3]. The result 
is [—3, 3, 4, 3, — 3], giving 

[6] [- 3, 3, 4, . . .]■ 

This is Spencer’s 15-point formula. It covers sets of 15 consecutive terms, the weights 
in full being 

i [- 3, - 6, - 5, 3, 21, 46, 67, 74, ... ] 

Example 29.3. Spencer^ s 21-point Formula 
In a similar way we fin d 

jL [5]^ [7] = 1 + 4^^ 

giving, to third differences, 

~ [5]“ [7] (1 - 43=) 

= [S? [7] [- 4. 9, - 4] 

We now add to the factor [—4, 9, — 4] the expression 

_ 35^ - -W = [- 3, 12, - 18, 12, - 3] + [- 3, - 7^ 10, - 7-|, 3, - |] 

giving 

«. - [5]= [7] [- h 0, i, 1, i, 0, - -H 

- 1 , 0 , 1 , 2 , . . .]. 

This is Spencer’s 21-])oint formula. 

29.20. A few practical points arising in the application of the foregoing formulae 
are worth mentioning. 

(a) The order in which the iterations are cari-ied out is of course immaterial, as the 
reader can easily verify. It is therefore more convenient, a,s a rule, to carry out the more 
complicated operations first, while the numbers being handled remain small. For instance, 
in applying the Spencer 16-point formula we should carry out the moving average 
[— 3, 3, 4, 3, — 3] first, then apply the simple average [5], and then the two averages 
of four. This does not apply if the series is short, inasmuch as there are fewer of the final 
than of the initial operations. 

(b) The use of a moving average of extent 2/r -f- 1 involves the absence of h terms at 
the end and k terms at the beginning of the trend-series. If the original series is short the 
loss may be serious, and this effect sometimes restricts considerably the extent of the 
average which we are able to apjDly. 

(c) It is possible to remedy the deficiency at the ends of the series by special formulae, 
but the values so derived have less reliability than those of the main trend-line, and on 
the whole it seems better to accept the loss of 2/c terms unless trend -values for the beginning 
and end of the series are really essential. 
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{d) As yet we have given no guide as to the choice of most suitable values of m and jp. 
In practice we do not usually require to fit curves of degree higher than five, and often 
a cubic is sufficient, as is assumed in the Spencer formulae. There is greater elasticity in 
the choice of m, but the point mentioned in (&) above requires m to be as small as possible, 
consistent with other requirements. We shall see later in the chapter that the variate- 
difference method gives some further guide as to p, and that certain effects of trend-elimina- 
tion on random elements bear on the extent determined by m. 

(e) There is a voluminous literature on trend-fitting which appears to me out of pro- 
portion to the importance of the subject. It is not difficult to pursue inquiries on the 
above lines to the point of extreme apparent precision and great mathematical complexity, 
and perhaps such work is valuable where the series is fairly smooth and not disturbed 
seriously by sampling variation or superposed random fluctuation. But many of the 
series encountered in statistical practice will not bear the weight of great refinement in 
trend-fitting. The student will probably find that a knowledge of fitting by moving 
averages will be sufficient for all ordinary and many extra-ordinary purposes. 


The Effect of Trend-elimination on Other Components 

29.21. In Table 29.6 we have apphed the Spencer 21 -point formula to an artificial 
series obtained by adding a random element to a cubic. Specifically, 

Ut = {t- 26 ) + (« - 26)2 + ^ (« - 26)3 + £, . . . ( 29 . 22 ) 

The component was taken from tables of random numbers and consists of samples from 
a population in which all integral values from 0 to 99 are equally frequent. The various 
columns of the table illustrate the process of fitting, and we may note in passing that for 
a series as short as this it is convenient to leave the more difficult summations to the last 
as there are substantially fewer of them. 

Now we know that the Spencer formula wiU fit a cubic exactly, so that when we sub- 
tract the trend from the original series we ought to eliminate the systematic constituent 
entirely and be left with our random component, except in so far as we have rounded off the 
systematic element to the nearest unit. A comparison of columns ( 2 ) and ( 9 ) in Table 29.6, 
remembering that the latter includes an element 49-6 equal to the mean of the random 
component, shows that we do not do so. The reason is not far to seek. The moving 
average has acted on the random element itself and determined a trend-line in it. 

The results of applying the Spencer 21 -point formula to the random element Sf are 
shown in column (11). We should expect that if the method were perfect the values in 
this column would be 49-5, the mean of apart from irregular sampling effects ; but 
not only do the observed values deviate from this mean, they do so systematically, the 
values having a small oscillatory movement which is shown as part of the trend. 

29.22. This effect can assume considerable importance, particularly if we are elimina- 
ting trend so as to concentrate attention on oscillations. AVe proceed to examine it more 
closely. 

Suppose that we have a series composed of the sum of three parts, a trend (t), an 
oscillatory term (f>z (t), and a random element ^3 (t), so that 

'^i ~ -h <^2 ^ 3 - 


. (29.23) 
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Series given, by Equation {29.22) with Trend-Line determined by a Spencer 21-point Formula. 


(1) 


(3) 

(4) 

(5) 

(6) 

(7) 

(8) 

(9) 

(10) 

(11) 

t 

Cubic 






[- 1, 0, 1, 


Deviation 

Graduation 

Term. 

8t 

Ut 

[6]«, 

[5] (5). 

[7] (6). 

2 . . ] (7). 

(8)- 

Ut — (9). 

of St alone. 

1 

-119 

23 

-96 








2 

-105 

15 

-90 








3 

- 92 

76 

-17 

-246 







4 

- 80 

48 

-32 

-209 







5 

- 70 

59 

-11 

- 87 

-672 






(5 

- 00 

1 

-59 

- 42 

-241 

* . , 





7 

- 51 

83 

32 

12 

162 






S 

- 44 

72 

28 

85 

413 

^233 





9 

- 37 

69 

22 

194 

670 

3,801 





10 

- 31 

93 

02 

164 

844 

5,120 





11 

- 26 

76 

50 

215 

957 

5,984 

14,352 

41 

9 

67 

12 

- 22 

24 

2 

180 

990 

6,042 

15,470 

44 

-42 

66 

13 

- IS 

97 

79 

198 

1,078 

7,041 

15,816 

46 

34 

63 

14 

- 15 

8 

_ 7 

233 

1,020 

7,145 

15,670 

45 

-52 

60 

15 

- 12 

80 

74 

240 

1,071 

7,038 

14,978 

43 

31 

55 

l(i 

- 10 

95 

85 

103 

1,009 

0,934 

14,166 

40 

45 

51 

17 

_ S 

23 

15 

231 

948 

0,709 

13,379 

38 

-23 

47 

18 

7 

3 

- 4 

190 

850 

0,535 

12,703 

36 

-40 

43 

1 U 

.... (i 

07 

61 

112 

892 

0,408 

12,169 

36 

26 

40 

20 

5 

44 

39 

148 

853 

0,303 

12,102 

36 

4 

39 

21 

- 4 

5 

1 

205 

852 

(.),44() 

12,279 

35 

-34 

39 

22 

- 3 

54 

51 

192 

944 

0,(511 

12,670 

30 

15 

39 

23 

2 

55 

53 

195 

1,024 

6,7(59 

13,228 

38 

15 

40 

24 

25 ; 

>) 

— .j 

1 50 

48 

204 

1,031 

7,052 

13,857 

40 

8 

41 

- 1 

43 

42 

228 

1,015 

7,353 

14,508 

41 

1 

42 

20 

0 

' 10 

1 

10 

212 

1,050 

7,(510 

15,120 

43 

-33 

43 

27 

1 

74 

75 

170 

1,130 

7,923 

15,034 

45 

30 

44 

28 

2 

I 35 

37 

230 

1,153 

8,249 

10,251 

; 40 

- 9 

44 

29 

4 

8 

12 

290 

1,2(»1 

8,(507 

17,002 

49 

-37 

45 

30 

() 

90 

90 

245 

1,337 

9,019 

17,717 

i 61 

45 

44 

31 

9 

01 

70 

200 

1,357 

' 9,424 

18,499 

53 

17 

44 

32 

12 

18 

30 

312 

1,373 

' '.>,870 

19,307 

55 

-25 

43 

1511 

J5 

37 

52 

250 

1,402 

10,429 

20,159 

58 

— 0 

42 

34 

20 

44 

04 

300 

1,541 

10,989 

21,133 

00 

4 

41 

35 ' 

24 

10 

34 

334 

1,599 

11,(579 

22,417 

04 

-30 

39 

30 : 

30 1 

90 

120 

339 

1,700 

12,539 

23,797 

08 

68 

38 

37 

3(i i 

22 

58 

370 

1,897 

13,529 

25,737 

74 

-16 

37 

38 

44 ! 

1 

13 

57 

411 

2,047 

14,(599 

27,955 

80 

-23 

30 

39 ' 

52 

43 

95 

443 

2,233 

1(5,0(50 

30,450 

87 

8 

35 

40 

()1 

14 

75 

484 

2,452 

17,570 

33,334 

95 

-20 

34 

41 

71 

87 

158 

525 

2,711 

19,353 

36,710 

106 

53 

34 

42 

83 

10 

99 

589 

2,900 

21,394 




43 i 

95 I 

3 

98 

070 

3,270 

23,(590 

1 




44 

109 

50 

159 

092 

3,080 

20,265 





45 

124 

32 

150 

794 

4,088 





40 

140 

40 

180 

935 

4,529 






47 

158 

43 

201 

997 

5,017 ; 






48 

177 

02 

239 

1,111 






49 

198 

23 

221 

1,180 

. . * 1 






50 

220 

50 

270 


1 






51 

244 

5 

249 


* • * 
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If we determine the trend hy a moving average, denoted by an operation T, then clearly 

Tut = .... (29.24) 

Let us now suppose that our method of determining trend is perfect in the sense that 
== <f>x. Then, on subtracting (29.24) from (29.23) to eliminate trend, we find 

— Tut = ((^2 — T<f>2) -h {<f>3 — Tcf>s). . . . (29.25) 

The point of present interest is that the terms T ^2 (29.25) may distort 

the genuinely oscillatory parts of the residual series and induce spurious oscillatory move- 
ments. 


29.23. Consider the simple case when </>a is a sine term, sin (a -f- U), t being integral. 
Since 


k 

z sin (a -h ^t) 


sin 

sin 


sin {a -+- -| (& -j- 1 ) A), . 


(29.26) 


a simple moving average of h consecutive terms will result in a sine series of tlie same 
period and phase as the original, but with the amplitude reduced by the factor 


1 sin \k?i. 
h sin JA ‘ 


. (29.27) 


Iteration q times will reduce the amplitude by the g'th power of this factor. 

Thus the term will be small if k is large, q is large, or if \k}. is a multiple of tt, 
that is, if the extent of the moving average is a period of the oscillation. But if A is small 
and kX is small the amphtude is reduced very little and <^63 — 7^2 will largely disappear, 
i.e. the moving average will partially obliterate the term in ^ 3 . In this case, hX being 
small, the extent of the moving average is small compared with the period of the harmonic 
term, that is to say the oscillation is a slow one. This result is what we should expect. 
A slow oscillation is treated as a trend by the moving average and eliminated accordingly. 
Generally, the moving average will emphasise the shorter oscillations at the expense of the 
longer ones. Furthermore, if the extent of the average is slightly greater than the period, 
the term (29.27) may have a negative sign, and consequently the difference from the trend 
may somewhat exaggerate the true oscillations. 

It is not so easy to exhibit the precise effect of the moving average when the weights 
are unequal and the terms are not harmonic, but evidently the same kind of situation is 
apt to arise. 


29.24. Now consider the effect of a simple moving average (that is, one with equal 
weights) on the residual element <563 which we will suppose to be a random element Sf with 
variance v. For the term we have 

£i+j (29.28) 

where is the greatest integer which does not exceed ^k. Consecutive values of are 
independent, but consecutive values of T<f >3 are not ; for T <^3 (a) and 7 ^;, ( 6 ) have 
k ■ (<z 6 ) values of s in common and are correlated if ci — b <C. k. Thus the series T<f >3 

will be much smoother than (f>^, and if we proceed to further averagings will become smoother 
still. We have had an example of this effect in Table 29.6, and shall meet further 
examples below. 
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29.25. The effect of taking a moving average of a random series will then be to 
generate an oscillatory series, provided that the weights are such as to give a positive 
correlation between successive members of the generated series, a condition which is always 
realised in moving averages employed for trend-fitting. We shall call this the Slutzky- 
Yule effect, after the two statisticians who (independently) studied it in detail. 

The generated series is not regular in the cychcal sense, that is to say its peaks and 
troughs do not recur at equal intervals of time, and the amplitudes of the oscillations vary 
considerably. Nevertheless such oscillations present a striking resemblance to the kind 
of movement which is found in practice, particularly in economic time-series, and we shall 
consider them in more detail in Chapter 30. For our present purposes we require to con- 
sider how far the process of trend-elimination itself may generate such effects in order 
to be sure that oscillatory movements in a trend-free series have not been put there, so 
to speak, by our own arithmetical processes. 


29.26. For this purpose we shall consider the period and variance of a series gen- 
erated by the Slutzky-Yule effect. 

Since the peaks and troughs do not recur at equal intervals there is no quantity which 
we can conveniently call the length of the oscillation. There will, in fact, be a distribution 
of lengths. We may define as the mean length either the mean period from peak to peak, 
or that from trough to trough ; but this raises some difficulties as to whether we are pre- 
|)ared to admit as periods small ripples on the main undulation. 

Recognising its somewhat arbitrary character, we shall take as our measure of oscilla- 
tory length the mean distance between “ upcrosses ”, that is to say the mean distance 
l)etween points where the scries changes sign from negative to positive or “ crosses the 
a'-a.\is ”. Suppose the series is generated by a moving average with weights . . . % 
of a random variable which is normally distributed with variance v. Then the probabihty 
that 


and 


Ur 


'h: I - 1 


J I 

Y 

j 1 


frecpu'iicv of 


1 

(2jT:)i(*i 1) 


0 ■- 

0 . 

. (29.29) 

(Ij S; .| , > 

0, . 

. (29.30) 

•om negative to jjositive, is 

the proportional 

1 - _J 

dsi . . . dsjf^i 

. (29.31) 


betw(‘cn the hyperplanes ^ (fj <> ^ i This is equal to the angle 


J-:\ ,/ -1 

between these two planes, which is given by 

/•-I 


cos 0 




a: 


. (29.32) 


Hence the mean distance between upcrosses is 2tc/0, where 0 is given by (29. .32). 
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29.27. In a similar way, the probability that 

U]^ 1 ^9, . . . • • 

that is that Uy. is a peak of the series, is the angle between the two hyperplanes 


K n > 

k k 

X! ^0 ^3 ~~ X 


0 


and is given by 


cos 01 




j=i 


(^2 — ^l) “h (®3 ^2) (®2 ^1) “!"••• 

+ (<^& — ®A-l) — %-2) “ (% %-l) 


(29.33) 

(29.34) 


(29.35) 

(29.36) 


(29.37) 


{<Xi -i- {a^ — fti)^ -1- . . . -f 0^1} 

Thus the mean distance between peaks is 27 c fO^. The same formula obviously applies to 
mean distance between troughs. 


29.28. If we wish to exclude “ripples” of a certain length d from consideration 
we may inquire for the probability that (29.35) and (29.36) are satisfied in conjunction with 

(^9-38) 

This is evidently the area cut off on the unit sphere by the three planes (29.35), (21>.3()) and 

k k 

X ~ X (29.39) 

i=l i=l 

If the angles between the planes are A, B and C this area is ^4 + i? + C — 2.t =().>, say. 
The mean length between peaks, ripples excepted, is then 4 je/ 02. 


Example 29.4 

In Table 29.7 we show 480 terms of a series of random numbers which can take integral 
values from 0 to 19, together with a moving sum of fives of a moving sum of tlvrees. 
Eig. 29.6 shows a portion of the derived series graphically. There are 474 terms of the 
smoothed series. 

The mean value of our series is 15 x 9-5 = 142-5. The number of upcrosses will be 
found from the table to be 23, the first between the 19th and 20th tertn of the smoothed 
series, the last between the 459th and the 460th. The mean distance between upcrosses 
is then 440/22 = 20 units. How does this compare with the mean-distance given by 
“ normal ” theory ? 

The weights of the graduation are [1, 2, 3, 3, 3, 2, 1] and from (29.32) we have 

cos 0 = 2 ) -}- (2 X 3 ) . . . + (2 X 1) 

12 +"22 + . . /”-f 12“ 

= = 0-9189 

37 

0 = 23° 14'. 

Hence the mean distance = _ 15:5 units. 

23-233 
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422 16 

146 


152 

424 11 

144 

425 12 

124 

426 2 

106 

4 

2 

7 3 

100 

4 

2 

8 9 

119 

4 

2 

9 6 

139 

4 

3 

0 17 

169 

4 

3 

1 15 

174 

4 

3 

2 6 

179 


433 

14 

172 

434 

14 

155 

435 

9 

133 

436 

8 

107 

437 

3 

75 

438 

1 

68 

439 

3 

65 

440 

1 

72 

441 

5 

91 

442 

16 

96 

443 

8 

91 

444 

2 

78 

445 

0 

75 

446 

2 

85 

447 

7 

109 

448 

17 

124 

449 

12 

124 

450 

5 

117 

461 

2 

106 

452 

2 

97 

453 

15 

92 

454 

8 

100 

455 

2 

111 

456 

4 

120 

457 

11 

121 

458 

16 

119 

459 

8 

110 

400 

3 

98 

401 

1 

98 

462 

4 

121 

463 

13 

150 

464 

17 

170 

405 

19 

176 

466 

5 

169 

407 

4 

149 

408 

16 

136 

469 

8 

137 

470 

6 

136 

471 

14 

133 

472 

9 

126 

473 

0 

125 

474 

15 

109 

475 

7 

103 

476 

6 

96 

477 

1 

95 

478 

11 


479 

6 


480 

6 
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The observed mean distance is 20-0 units, but this is based on rectangular variation, and 
we are, perhaps, entitled to expect some difference from normal theory. For rectangular 
random variables, values distant from the mean occur more frequently, and it is not sur- 
p)rismg to find oscillations in the series which do not result in upcrosses. 

The number of peaks in the series will be found to be 62, the first at the seventh term, 

469 

the last at the 466th. Hence the mean distance between peaks is — =7-5 units. From 
formula (29.37) we find 

cos 01=1 0i = 48°ir. 


Thus the theoretical mean distance is = 7-5 units, in good agreement with experi- 

ment. It will be observed that several of the distances between peaks are due to very 
;small ripples. 

From a number of experiments Dodd (1939a) concluded that series generated from 
rectangular material conformed fairly well to normal theory. 


29.29. Let us now examine how the variance of the induced oscillation compares 
with the variance of the original random series. 

The sum of k random elements with variance v has variance kv and its mean has 
variance v/k. It does not follow that a simple moving average has a variance l/k times 
that of the random element, because of correlations between successive members in the 
derived series. If the original series was Sj . . . s.,^ the derived series is, with weights 

di Si -h Uj £2 + • - • 

£2 + »! £3 £^.^1 == ^ 


The expected value of the sum of these values is zero since the expected value of f may be 
taken to be so. Since there are n — k 1 terms we have for the variance 




n — k 1 


ZriK 


(29.41) 


The expected value of this, since the e’s are independent, is 

n '— 1 ^ ^ ^ + «! -f • . . al) V. . . (29.42) 

In particular, if the a’s are all equal to l/k, the expected value of the variance is v/k. This 
gives us the dvemge reduction in the variance. 

If a simple average of extent k is iterated q times the weights art' tiu* successive 
coefficients in 

■^-(1 X + + . . . -f x/'^-^yi. 


'The sum of squares of these coefficients is the coefficient of in 


k« 


(1 + X -f X® + . . . 4- 


(1 - x^y^'^ 
Ifi (f - x)2'/ 


. (29.43) 
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and this gives the average reduced variance for a simple average of k iterated q times. 
The following are the values of the reducing factor for some of the values of h and q : — 


g. 



1 

2 

3 

4 

5 

3 

' 

033 

0-23 

019 

0-17 

0-15 

4 

0-25 

0-17 

014 

012 

Oil 

5 

0-20 

0-14 

Oil 

0-10 

0-09 

0 

017 

Oil 

0-09 

0-08 

0-07 

7 

014 

0-10 

0-08 

0-07 

0-06 


Evidently the result of the first moving average is to generate a series with a much 
lower variance than that of the original random element, but the second and succeeding 
iterations do not reduce the variance further to the same extent. In the case k — 1 the 
first averaging reduces the variance to one-seventh, but the next three reduce it only by 
a further half. 

29.30. To apply such results in practice we require an estimate of the variance of 
the random element in the original series. If this is available we can estimate the variance 
of the generated series and also, from 29.26, the mean distance between upcrosses or 
between peaks. If then our residual series, after the elimination of trend, showed an oscilla- 
tory movement with this variance and these mean-distances, within samphng limits, we 
could not conclude that the oscillatory effect was real. It could have been induced by 
our method of eliminating trend. 

In the present state of knowledge it is not possible to assign permissible limits of 
sampling variation by relation to standard errors in the usual way. Whether any particular 
effect is significantly different from the values of the series generated from the random 
element remains, therefore, a matter of subjective judgment to some extent. The sampling 
problems involved are formidable, but there does not seem any reason why they should 
not be capable of explicit solution. This field of study awaits the attention of the theorist. 

Example 29.5 

For tlie data of Table 29.3 (sheep population of England and Wales) trend was elimi- 
nated by a simple average of nines, the resulting residuals being shown in Table 29.8. 
A glance at the series suggests some sort of oscillatory effect, since the signs of terms cluster 
together. By the methods of the next chapter the effect may be brought into greater 
prominence. The data themselves, however, indicate a mean-distance between upcrosses 
of about 8 or 9 years, and actual calculation gives a variance of 8474. Can this be due 
to tlie operation of our trend-elimination on a random element in the original series % 

For the mean distance between upcrosses due to a simple nine-point average we have 

cos 0 = 0 ^ 27° 16', 

9 

300 

and the mean distance is x',-— = 13-2 approximately. This is considerably in excess of 

our observed value, but not sufficiently so to reject outright the possibility we are examining. 

Since, however, the variance of residuals is 8474 this must, to have been generated 
from a random series by a simple average of nines, derive from a random element with 
A.S. — VOL. n. c c 
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TABLE 29.8 

Residual Values of the Sheep Series of Table 29.3 after ElimiTiation of Trend by a Simple 

Nine-Point Moving Average. 


Year. 

Residual 

(10,000). 

1871 

- 176 

72 

- 112 

73 

-f- 50 

74 

-f 141 

75 

-1- 60 

76 

- 20 

77 

-+- 12 

78 

-f- 82 

79 

H- 130 

80 

- 14 

81 

- 166 

82 

- 179 

83 

- 84 

84 ■ 

-h 38 

86 

-f- 97 

86 

+ 8 

87 

- 6 

88 

- 105 

89 

- 99 

90 

-f- 35 

91 

-h 159 

92 

+ 167 


Year. 

Residual 

(10,000). 

1893 

+ 34 

94 

- 103 

95 

- 104 

96 

- 15 

97 

- 23 

98 

+ 17 

99 

-h 71 

1900 

35 

01 

4- 16 

02 

- 27 

03 

- 32 

04 

- 49 

05 

- 61 

06 

- 52 

07 

— 24 

08 

-|- 68 

09 

-+- 141 

10 

-f- 119 

11 

4- 66 

12 

- 52 

13 

- 117 

14 

- 61 


Year. 

Residual 

(10,000). 

i 

1915 

4- 19 

16 

4- 128 

17 

4- 97 

18 

4- 69 

19 

- 29 

20 

— 174 

21 

- 107 

22 

- 142 

23 

- 109 

24 . 

- 23 

25 

4- 60 

26 

4- 121 

27 

4“ 94 

28 

- 26 

29 

— 90 

30 

- 75 

31 

4- 72 

32 

4- 162 

33 

4- 112 

34 

— 64 

35 

- 87 


variance 76,266. An estimate of the variance of the random element in the original series, 
obtained by the variate-diiFerence method which we describe below, was only 350 approxi- 
mately. Making every allowance for sampling effects, we cannot do otherwise than reject 
decisively the possibility that the residual oscillation is spurious in the sense of having 
been induced into the data by the effect of the elimination of trend on a random element. 

29.31. We may summarise the foregoing discussion of trend-elimination as follows : — 

(а) The conception of a trend as a “ smooth ” or “ regular ” movement is equivalent 
to the supposition that trend can be represented, at least locally, by a smooth mathematical 
function and in particular by a polynomial in the time-variable. 

(б) Certain series can be treated on lines formally equivalent to regression analysis ; 
but a more generally applicable procedure is to represent the trend by a moving 
parabolic are. 

(c) The moving arc of best fit in the least-squares sense gives values which are deriv- 
able from a moving average of the data. The weights of this average are to some extent 
at choice, according to the extent of the average and the closeness of fit required in the 
moving arc. 

(d) A moving average of extent k sacrifices (^ — 1) terms, in the sense that the derived 
series is (Jfc — 1) terms shorter than the original series. If the series is short it is usually 
desirable to keep this loss to a minimum, that is, to keep the extent of the average as 
short as possible. 
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(e) A moving average may distort genuine oscillatory effects, in general exaggerating 
the shorter variations at the expense of the longer ones, and. may induce spurious oscillatory 
phenomena by its action on random residuals. For harmonic components the effect is 
minimised by taking the average as simple, with extent equal to the period of the com- 
ponent. For random components the effect is minimised by making the sum of squares 
of weights in the average a minimum, i.e. by using a simple average. 

29.32. In the theory of time-series there are very few rules which can be laid down 
without a good deal of proviso and caveat. It will be evident from the foregoing that there 
is no golden rule in trend-fitting which can be applied irrespective of individual circum- 
stances. If we desire to get a close fit to the data we must use a parabola of fairly high 
order, which involves a moving average with weights which are far from equal. T his , 
however, increases the danger of obscuring the true oscillations in the residuals. In 
most practical cases it is necessary to strike a balance between conflicting requirements 
by intuitive judgment as to the appropriate moving average to use. 

The Variate-difference Method 

29.33. We now proceed to consider the random constituent of a time-series. From 
the very nature of random variation we cannot expect to derive any formula, however 
approximate, which will measure the random component directly at any given point of 
the series. The best we can hope to do is to determine the non-random components and 
to obtain a random residual which is left unaccounted for by those components ; and even 
this, as we sliall see in the next chapter, is not a very strong hope when oscillations appear 
in the. series. 

On certain assumptions, however, we may determine the variance of the random 
component and hence obtain a general idea of its magnitude and importance. Suppose 
that the systematic part of tlie series can be represented, at least locally, by a polynomial. 
Then successive differencing of the scries will gradually eliminate the polynomial element 
but will not reduce the random element correspondingly. As we proceed with the differ- 
encing, the random element bcM^omciS more and more predominant until finally the syste- 
matic component is negligible. Hence we can deteianine effectively the variance of the 
random (jomponent in the <lilferenced seri^^s, and by a simple calculation derive an estimate 
of that in the original series. 

29.34. Consider the differencing of a random series We have 

A e( — . . . . . . . . . (29.44) 

‘> ^ 2 ■ (29.45) 

Without loss of getierality we may suppose that the mean value of is zero, and thus 

JE {A^-ei) 0 (29.46) 

var (/!'■ Cl) = E 

= E ^ + • • • + (~ ir j- 

= E -f ^ 2 y ^l+r-X + - - • + slj' 

= + . . . +l}. 


/F 




(0 




Hence 
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TIio sum in curly brackets is easily evaluated from the consideration that it is the coefficient? 
of aj*- in (1 + xf {x + l)^ that is, equals V Hence 


var (/d^ fi^) = V ^ y ... . • ■ (29.47) 


We may then derive an estimate of v by writing 

V = (29.48) 

{")■ 


It is to be noticed that we use the second moment about zero, not the observed variance 
of zl’’ Si, since the mean is known to be zero. This shortens the arithmetic to some extent. 


The factor 



for r = 1 to 10 has the following values : — 


r 

(T) 


1 

2 

0-5 

2 

6 

0-166,667 

3 

20 

0-05 

4 

70 

0-014,285,7 

5 

252 

0-023,968,25 

6 

924 

0-021,082,25 

7 

3,432 

0-02,291,375 

8 

12,870 

0-0^7,700,1 

9 

48,620 

0-0^20,567,7 

10 

184,756 

0-055,412,54 


29.35. Basing itself on equation (29.48) the method of variate-differences proceeds 
as follows : We difference the series once, find the second moment about zero of the result- 
ant and divide by 2 ; we then difference again and find the second moment about zero, 
dividing in this case by 6 ; and so on. If the successive estimates of v decrease, we con- 
tinue with the differencing. There will, in general, come a point when they cease decreasing 
and remain constant within sampling limits (which may be rather wide). At this stage 
we may suppose that we have ehminated the systematic element in the original series. 
The final estimate gives us an estimate of the variance of the random element in the original 
series, and the order of the difference to which we have had to go will give an indication 
of the degree of the polynomial representing the systematic component. 

Example 29.6 

Let us apply the variate-difference technique to the series of Table 29.6. We know 
from the method of constructing the series that the systematic part ought to be completely 
ehminated after the third differencing, and also that the random part consists of an element 
with variance 833 approximately. In fact, the random numbers from 1 to W have a 
variance {N^ 1)/12 and N in this case is 100. The actual variance of the random element 

in Table 29.6 is 843. 
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table 29.9 

^differences of the Series u. of Table 29.6. 


t 


1 

-90 

2 

-90 

3 

-17 

4 

-32 

f) 

-11 

({ 

-79 

7 

32 

H 

28 

9 

22 

10 

02 

U 

70 

12 

2 

13 

79 

14 

7 

1.7 

4 

10 

87 

17 

17 

IS 

4 

1!) 

01 

20 

39 

21 

1 

22 

71 

23 

73 

24 

4S 

2.7 i 

12 

20 

10 

27 ! 

ir> 

2K 1 

37 

2!) 

12 

30 I 

9(; 

31 

70 

32 

30 

33 

72 

34 1 

04 

37 j 

34 

30 j 

120 

37 j 

78 

3S i 

77 

3!) ; 

97 

■10 I 

77 1 

41 I 

178 I ■ 

42 , 

99 : 

43 j 

98 

•14 ! 

1 79 

47 

1,7 (i 

40 

180 

47 

201 

4S 

239 

•19 i 

221 

70 

270 

71 1 

249 




— 6 

67 

-73 

- 88 

15 

36 

-21 

- 69 

48 

139 

-91 

- 95 

4 

- 2 

0 

46 

-40 

- 52 

12 

- 36 

48 

125 

-77 

-163 

80 

167 

-81 

- 70 

-11 

- 81 

70 

51 

19 

84 

-07 

- 87 

22 

- 16 

38 

88 

-•70 

- 48 

— 2 

- 7 

7 

„ 1 

0 

- 26 

32 

97 

• 07 

-103 

38 

13 

27 

109 

84 

- 110 

2() 

14 

40 

02 

- 22 

- 10 

-12 

- 42 

30 

122 

- - 92 

-lOO 

08 

07 

1 

39 

-38 

- 58 

20 

103 

-83 

-142 

79 

78 

1 

02 

-01 

- 64 

3 

27 

- 24 

- 3 

-21 

17 

-38 

- 56 

18 

67 

-49 

- 70 

21 

. . • 


AK 

AK 

155 

279 

-124 

-229 

105 

313 

-208 

-442 

234 

327 

- 93 

- 46 

- 48 

-146 

98 

114 

- 16 

145 

-161 

-449 

288 

618 

-330 

-567 

237 

226 

11 

143 

-132 

- 99 

- 33 

-204 

171 

242 

- 71 

33 

-104 

-240 

136 

177 

- 41 

- 35 

- 6 

- 31 

25 

148 

-123 

-323 

200 

316 

-116 

- 20 

— 96 

— 315 

219 

315 

- 96 

- 20 

- 7(5 

-148 

72 

40 

32 

196 

-164 

— 446 

282 

509 

-227 

-255 

28 

— 69 

97 

258 

-161 

-406 

245 

445 

-200 

-196 

- 4 

-130 

126 

217 

- 91 

-121 

30 

50 

- 20 

- 93 

73 

196 

-123 

-260 

137 



Z|5. 

A^. . 

508 

1050 

- 542 

-1297 

755 

1524 

- 769 

-1141 

372 

271 

101 

361 

- 260 

- 229 

- 31 

- 625 

594 

1661 

-1067 

-2252 

1185 

1978 

- 793 

- 876 

83 

- 159 

242 

137 

105 

551 

- 446 

- 655 

209 

- 64 

273 

690 

- 417 

- 629 

212 

216 

- 4 

175 

- 179 

- 650 

471 

1110 

- 639 

- 975 

336 

41 

295 

925 

- 630 

- 965 

335 

207 

128 

316 

- 188 

- 32 

- 156 

- 798 

642 

1597 

- 955 

-1719 

764 

950 

- 186 

141 

- 327 

- 991 

664 

1515 

- 851 

-1492 

641 

707 

- 66 

281 

— 347 

- 685 

338 

509 

- 171 

- 314 

143 

432 

- 289 

- 745 

456 

. . • 
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Table 29.9 shows the series and the differences up to Zl®. Eor the sums of squares 
in the various columns Sj corresponding to A^, we find — 


Si = 107,541 

S^ = 318,115 

/Ss = 1,033,513 
S^ = 3,445,308 
Ss = 11,720,069 
Se = 40,548,844 


To obtain second moments we divide by 51 

following : — 

j 

1 

2 

3 

4 

5 

6 


— j and then, to obtain the estimate of v, 


Estimate. 

1075- 41 
1082-02 

1076- 58 
1047-21 
1011-05 

975-20 


Curiously enough, the estimate for = 2 is higher than that for j — 1 and there is 
little difference between the various estimates. In the ordinary way we should have 
concluded that the systematic component was adequately represented by a polynomial 
of order 1, that is to say a straight line, and that the residual random element had a variance 
of about 1000. 

The reader must not be surprised to find discrepancies of this kind between theory 
and experiment in short series ; and the discrepancy is not, in fact, as })ig a.s it seems. 
The variance of the original series is 6272-61. The mean square of the first difference, 
divided by 2, is 1075-41, so that about five-sixths of the variance has been eliminated by 
•the first differencing, and the method indicates, quite correctly, that the greater X)art of 
the systematic element is linear. The random element is rather large compared with the 
non-linear systematic terms, and the latter have got caught up in it — the series is too short 
for the variate-difference method to disentangle them. Consider, for instance, the cubic 
term {t — 26)®. In the original series this varies in value from — 156-25 to H 156-25. 

3 

Eirst differences reduce it to (^ — 26)®, varying from 18-75 through zero to 18-75, 

whereas the random element is increased in range from 0 to 198. Already the systematic 
term is being swamped by the random element, and a slight degree of accidental correlation 
between the two can easily account for the increase in the mean-square of second differences. 

The matter may be put in a slightly different way. Suppose that, relying on the 
variate-difference method, we regarded the data as represented by a linear ecpiation plus 
a random residual. If we fitted a straight line by least squares and examined the residuals, 
we should probably find very little evidence of departure from randomness. This repre- 
sentation would differ from the mode of construction of the series, but it would be a 'possible 
method of construction. Only the failure of the representation to conform to further 
terms of the series would reveal its weakness. 
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29.36. The variate-difference method thus provides a kind of lower limit to the 
degree of the polynomial which will represent a series locally or generally. There remains 
for consideration the question as to what sort of differences between successive estimates 
of V can be regarded as chance effects, in order to decide when the value has reached a 
stationary level. The sum of squares is a constant factor times the second moment, 
but as its members are correlated among themselves we cannot use the variance of the 
second moment to test its significance. Eurther, and are correlated. We proceed 
to derive the sampling variance of their difference, the somewhat complicated formulae 
being due to Anderson (1914). 


29.37. Write 


>■-{;)■ 


Then w'e have, as in (29.42), 

^ E + • • - 


(") 


(6g -f 6? + . . . ) 




(29.49) 

(29.50) 


where is the variance of u. Eurther 
E (/h uY E [ {ho — bi Uj. 4- bz Ur-l — ■ • • + ( — I)*" ^>r 

“I' {^0 ^r+1 . . . -f- (— I)*" by Ui,Y 

1 - ... 

1 - K '>^n~rYY- • (29.51) 

( 'ousider first of all the terms in this which result in fourth powers of u. They will 
derive from 


E {h-i 


rl 1 i' '''4 4- • * • + blu\ 4- ^->0 '*^r+l 4" • • • + '^1 +■ • 

\ < -v^A<-v +• ■ ■ +blui_yY 

E {lYi, 1 M'l) |- (^>(1 I- bY 'M'i) + (&0 + + bl) 4- ^ 3 ) + • • 

4 (6(1 I b'f I- . . . 6;_,) ('tt(i-rM + '*^r) + (M + + • • • + bl) 


{<.-r 

Writing now 


u 


'n — r— I 


- • - -i- ' 4 -n)A 

14 ... {fYY d- (/4I + b‘iY d- - • • d- {b^ 4- d- ■ - • + 

.•15 - + . . . = 


. (29.52) 

. (29.63) 
. (29.54) 


\vt‘ He<‘ t hat tlie term in E {■u'^) is 

{At {n - 2r) -f 2Bt} E {u^) (29.55) 

'"I'lio only other term appearing from (29.51) will be of type E {uj I ^m. If the reader 
will write out the exj)ansion of (29.51) he will find that the coefficients are expressible in 

terms of 

Aj {bo b^ d- 61 «>;+i 4 - - . • + by_^ b^Y = ^ 

and 

Bj^^^ibohY'^-i-ibobj-i-b.bj+^Y-^ • • • d - (60 ^-d-^i ^-+1 d- • ■ • d- ^>r-,--i &r-i)"- • (29-57) 
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The expression for E {A'^ uY reduces to— 

{n~2r)AlE{u^) -f 4 { (% - 2r + 1) -f (w - 2r -f 2) + . . . 

-\- Al{n - 2r ^ r)]E {uf ul,) -f 2Rg E {u^) 

+ ® {^1 + -S| + • • • + ■{- Bl]E {ul u^. . . . (29.58) 

Substituting iovE {u^) and for E {uf dividing by (w — and subtracting 

fi 2 i we find the sampling variance of the estimate of v. The expression can, however, be 
simphfied to some extent. Putting 




+ 


-f r 


we find, after lengthy algebraic rearrangement, 

O 2 fi 

/^4 3/^2 I ^ — 


0 



(29.59) 


var 


S. 


{n-r)(^ 


2r' 

r 


n 


+ 


n — f 


(n — r) 

' 4r\ 

, 2r / 


2r\2 


'2ry 

r / 


2 {n — r) 


r < !•%. 


(29.60) 


If terms of order {n — r) ^ can be neglected, this reduces to 

fij — 3/^2 2^2 


n 


(29.61) 


^ 2r ^ ^ — r’ 

or, using the Stirling approximation to factorials, 

1 

{^4 - 3/^1 -f f4 V(2r7r) },.... (29.62) 

which is a fair approximation to (29.61), being within 3 per cent, for r as low as 6. 

When the population of values of u is normal, [x^ — Zpil vanishes and the formula 
simplifies accordingly. 

29.38. In a similar way it may be shown that 


8. 


8 


'r+l 


cov 


{n — r) 


' 2r 


{n~r ~l) 

__ yMa — 3^1 I 1 


/2r + 2\ 

\r + l). 


2t: 
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(29.63) 
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where 
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j + 2 
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f r -Y 1\ 

\i) ' 

+ 3/ 


+ • . . -f r 
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\o; Vr + l/ 


From (29,60) and (29.63) we can determine the variance of the difference of 


{n — r) 


' 2r 
r 


and 


S. 


r-f 1 


(n 


1 ) 


( 


2y -f 2 
r -f- 1 


The general formula is complicated, but for normal variation, large n and r > 6 we have 
analogously to (29.62), 


var 


S. 

{n — r) 


(r) 


S. 




(n — r — 


cni 

(3r + l)y'(2nr) 


S. 


r) 



(29.64) 


2 (2r + 1)3 (ti _ _ X) ^ {n 

The arithmetic application of the formulae has been facilitated by the preparation of tables 
of the constants involved. Reference may be made to Tintner (1940) who gives tables 
prepared by himself, Anderson and Zaycoff. 


Example 29.7 

For the data of Table 29.3 (sheep population) an application of the variate-difference 
method ux) to the tenth difference gave the following results : — 


r 

1 

2 

3 

4 
.3 
(> 
7 

5 
!) 

10 


r )<" -’■> 

3468 

1442 

854 

629 

518 

448 

401 

.371 

357 

347 


Pile values here are falling steadily from r — 1 to r = 10, but very slightly towards 
the end. From (29.64) for r = 6 we have for the variance of the difference, 80-7 approxi- 
mately and for r -= 10, 25-8 approximately. It appears that the reduction in variance 
at r — 10 is losing significance, and that a moving arc of degree 10 would be sufficient to 
eliminate the systematic component. It does not, of course, follow that the trend-line 
must be of this degree, for we may not want to eliminate the oscillatory movements in 
the trend-line. 


29.39. The variate-difference method will clearly not eliminate systematic effects 
such as periodic terms with very short period. Consider, for instance, the series 1,-1, 
1, — 1, etc. The first differences give us a series 2, — 2, 2, — 2, etc., second differences 



394 


TIME-SERIES 


4, — 4, 4, -- 4, etc., and so on. The variance of the series of rth differences is, neglecting 
effects due to the shortness of the series, 2^’’ times that of the original, and the quotient 


when this is divided by 



tends to 


(2r !) 

and so increases without limit. In such a case we cannot obtain an estimate of the variance 
of any random element which may be present. 


NOTES AND REFERENCES 

References to the fitting of polynomials are given at the end of Chapter 22. For the 
moving average see Whittaker and Robinson’s Calculus of Observations and the books by 
Macaulay (1931) and Sasuly (1934). 

Attempts have been made to use trend-fines for purposes of forecasting, and even to 
measure the standard error of a forecast — see Schultz (1930) and a discussion in Davis 
(1941). The methods proposed appear to me theoretically unsound and in practice they 
lead as a rule to such wide limits of error as to be of doubtful value ; but this is a personal 
opinion and the less sceptical reader may care to consult Davis’s book and to follow up 
the references given therein. 

For the effect of moving averages on random variables see Yule (1921) and Slutzky 
(19376), the latter being an English version of a paper published in Russian many years 
earlier. See also Dodd (1939(X, 1941a). Slutzky proves an interesting theorem — the 
theorem of the sinusoidal limit — to the effect that repeated moving averages of certain 
kinds applied to random series generate a sine-curve. 

For the variate-difference method see the book by Tintner (1940), a very thorough 
practical account with useful tables. The more important earlier memoirs arc tliose by 
Anderson (1914, 1923, 1926), “Student” (1914), Morant (1921), and K. Pearson and 
Cave (1914). 

EXERCISES 

29 . 1 . Show that in the formulae of equation (29.7) and similar formulae of higher 
orders the sum of the weights is unity. 

29 . 2 . By evaluating the solutions of (29.5) determinantally show that a parabolic 
curve of second or third order giving a graduation 

'^-i “h ■ + Oo '^0 4" • - . + Ui 

has 

a. = 3 3^" + (3^ - 1) - 5f 

(2'7i 1) (2% -j- 1) {fjTb “1-3) 

29.3. Show that the weights in the Spencer 21 -point formula are 

_[_ 1 , - 3 , _ 5 , _ 5 ^ _ 2 ^ 33 ^ 60 , ... ] 

and that if it is applied to a random series the variance of the resultant is about one-seventh 
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of the original series about the same reduction as would be given by a simple moving 
average of sevens. r a 


29.4. Show that Macaulay’s 43-point formula. 


[ 12 ] [ 8 ] [ 5 ]^ 


has weights 

1 


10 ’ 


1 , 0 , 0 , 0 , 0 , 0 , 0 , 1, . . . 


18, 30, 40, 45, 28, - 8, - 60, - 122, - 178, - 205, - 190, - 127, 

- 6, 163, 360, 562, 760, 928, 1050, 1127, 1156, . . .] 

and that it reduces the variance of a random series about as much as a simple average 
01 nines. ^ ° 

29.5. ^ Take^a random series of, say, 200 terms and determine “ trends ” by moving 

aveiages ^[J], gj[9]“ and Compare the mean distances between peaks and 

upcrosses with the theoretical values based on normal theory. 

29.6. If fc; is a random series, show that the correlation between successive members 

of lor long series is - „ and hence tends to - 1 as h increases. Hence show 

that the signs of successive terms in A^^ tend to alternate, where is the sum of a random 

element and a systematic element representable by a polynomial ; and verify by reference 
to lable 29.9. 

29.7. eliminating 5“ from (29.19) show that, lor a cubic curve, an accurate trend- 
line IS given liy 


1. _ f/e^-l 

hr - - kr 1 Ic 


h 




and generalise this result. 

(Cf. J. A. Higliam, ,/. Iml. Act. (1882-5), 23, 335; 25, 15, 245.) 
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30.1. The present chapter is devoted to a discussion of oscillatory effects in time- 
series. We shall suppose that our series is stationary, i.e. has no trend, either because the 
original data contained none or because trend has been removed by one of the methods 
described in the last chapter. Our typical series will then fluctuate round some constant 
value which we may usually, without loss of generality, take to be zero. We shall assume 
that there is a prior possibility that part of the variation at least is random. This, indeed, 

TABLE 30.1 

Trend-free Wheat-Price Index {European Prices) compiled by Sir William Beveridge for 

the Years 1600-1869. 


(From Beveridge, 1921.) 
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is necessary if our results are to haTe any practical appUcation, for most of the series 

encountered in practice have some element of irregularity, however smalL 

rru examples of the type of series under consideration have already occurred. 

The table of Example 21.11 (page 126) gives the deviations from a simple nine-year moving 
average of the yields of potatoes in tenths of tons per acre in England and Wales for the 
years ms 1935. Table 29.1 (Fig. 29.1) gives the annual yields of barley in cwts. per 
acre m England and Wales for 1884-1939, no nine-year ehmination of trend having been 
earned ^ case Table 29.4 (Fig. 29.4) gives rainfall data at London over the 
wntury 1813 1912. Table 29.5 (Fig. 29.5) gives egg-production per laying hen in the 


TABLE 30.2 

Marriage Rate in England and Wales : Deviation from a Simple 11-Year Moving Average 

for the Years 1843-1896. 

Units 1 in 10,000. 
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Year. 

Marriage 

Rate. 

Year. 

Marriage 

Rate. 

Year. 

Marriage 

Rate. 

1843 

- 6 

1861 

- 5 

1879 

- 12 

44 

1 

62 

- 7 

80 

- 5 

45 

12 

63 

1 

81 

0 

45 

47 

10 

64 

6 

82 
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- () 

65 

8 
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7 

48 

— 8 

66 

9 

84 
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— () 

67 

- 2 

85 

^ 4 
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51 

4 
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87 
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70 

- 7 

88 
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3 
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(> 
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73 

12 

91 

() 

5(i 

- 2 

74 

7 

92 

2 

57 

- 3 

75 

5 

93 

— () 

58 

-- 7 

7G 

4 

94 

™ 5 

59 

3 

77 

- 3 

95 

- 6 

GO 

4 

78 

- G 

96 

1 






J 


Tahhn 30.1 and 30.2 give two further examples. The first is a famous series of trend- 
free wlieat-prico indices compiled by Sir William Beveridge and extending over 370 years, 
a plienornenal length of time for economic series. The second is the deviation from a 
simple 1 1 -year moving average of marriage rates for the years 1843-1896. 

Oscillation and Cycle 

30.3. We will now attempt to define more closely the sense in which we use the 
words oscillation and “ cycle ”. It is particularly important to exercise great care in 
the use of an accurate nomenclature because a great deal of the literature on this subject 
suffers from confusion due to loose wording. 
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By a cyclical component of a time-series we shall mean one which is a strictly periodic 
function of the time, that is to say, for which there exists a period co such that 

Uf == = Uf^2co = ... = Ui+Jcco = . . . . . . (30. l) 

whatever the value of t. The periodic functions which we shall consider in particular are 
the sine and cosine functions. If the series can be represented as the sum of a cyclical 
component and a random constituent, or by a cyclical component alone, we may speak 
of it as a cyclical series. 


30.4. If the series is not random it must move with more or less regularity about 
the mean value, and we shall then speak of it as oscillatory. The oscillatory movement 
may be in part due to random elements but must not be entirely so. A cyclical series is 
oscillatory, but an oscillatory series is not necessarily cyclical. 

An oscillatory movement may be the sum of two or more cyclical components. Con- 
sider, for instance, the sum of two periodic terms 


. 2,7lt , . 27Ct 

= sin h sin — . 

Ct)i COa 


If coi and ct ) 2 are commensurable there will be numbers, and in particular a smallest number 
CO, which is an exact multiple of both of them. This is clearly a period of the series. 
But if coi and cog are not commensurable there will be no period of this kind and the sum 
will be oscillatory but not cyclical. 


30.5. It may be felt by the reader that we could reasonably extend the use of the 
word “ cyclical ” to cover series which are the sum of cyclical terms ; but the danger of 
doing so is that within certain hmits any series can be represented as a sum of harmonic 
terms, even if it is not itself oscillatory, in virtue of Fourier’s theorem. Admittedly such 
a representation, to be exact, must in general consist of an infinite series of terms and is 
valid only in a certain range, but in practice a comparatively small number of terms often 
gives quite a good approximation. We do not call a function a polynomial because it 
can be expanded in powers of the variable by Taylor’s theorem ; and corres})ondingly 
we shall not call it cyclical because it can be expanded as a sum of harmonic terms by 
Fourier’s theorem. On the whole it seems safer to avoid the word “ cyclical ” for series 
which consist of a finite number of cyclical terms. 

30 .6. For our present purposes the main significance of the distinction we are attempt- 
ing to make is that in a cyclical series the maxima and minima, apart from disturbances 
due to the superposition of a random element, occur at equal intervals of time and are 
therefore predictable for. a long way into the future — for so long, in fact, as the constitution 
of the system remains unchanged. In oscillatory series, on the other hand, the distances 
from peak to peak, trough to trough or upcross to upcross, are not equal, but vary very 
considerably. Similarly, in the oscillatory series the amplitudes of the movements may 
vary very substantially, whereas in a cyclical series they should be constant (again, except 
in so far as superposed random elements disturb them). 

30.7. Now the time-series observed in practice are very rarely cyclical as we have 
defined the term. The only case among those cited at the beginning of the chapter in which 
there appears to be any cyclical movement is that of egg-production per hen in Table 29.5. 
The far more usual case is that of varying amplitude and period from peak to peak or upcross 
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to xipcross. We shall therefore begin our study of oscillatory movements by considering 
the kinds of scheme which can give rise to the observed phenomena ; and then we shall 
examine methods of deciding which of the possible schemes should be chosen as the 
hypothetical representation in particular cases. 

Tests for Randomness 

30.8. The first stage, when confronted with a fluctuating stationary series, is to 
examine wliether the fluctuations are purely random. Tests of randomness are easy to 
find, and in fact the random series is the happy hunting-ground of the worker whose interests 
li(“i mainly in the mathematics of the direct theory of probability. We have considered 
some tc-sta which are appropriate to the study of oscillatory movement in 21.43 to 21.46. 
Others which have gained popularity are based on the distribution of “ runs ” and on the 
<^orreliltion between successive members of the series. The reader will have no difiiculty 
iiv c<>m])oaing others. All these tests are based on the non-parametric case, so that the 
alttn'imtivti hypotheses are not usually brought specifically into view. We cannot there- 
fore apply the general theory of Chapters 26 and 27 to determine “ best ” tests, and in the 
})IX ‘s(‘nt state of knowledge are forced to be content with less definite ideas. So far as 

of ajuxlioation goes, the tests of 21.43 and 21.44 seem to have decided advantages, 
though tlu^y may ire somewhat insensitive. The method of serial correlation, to which we 
rtdcr Ix'low. gives a useful alternative in doubtful cases. In the sequel we shall suppose 
t hat Ixdbrt' procsecding to search for systematic movements we have satisfied ourselves by 
OIK' or more of tlu'.sc tests that such movements exist. 

30.9. We shall consider three schemes which can account for the typical oscillatory 
inoxM'UKUit usually observed. 

(o) Moimig Averages. — We have already seen in Chapter 29 that a moving average 
of a, pui-ely I'andom element can generate an oscillatory series with all the required properties 
of varying ainplitude and mean distances — -the Slutzky-Yule effect (29.25). Fig. 29.6 
illustrat(‘s tlu^ kind of oscillation whicli may arise. It is at least possible that some of the 
ohst'rved oscillations in tinn'-scries may be generated in this way ; and in fact Slutzky 
(I <♦;{<►) has giv(vn a,n interc'sting example in which a part of his series generated by the 
moving avern.g<‘. hai)pens to agree very closely with an observed series. 

(//) Rimis of CydicMl Components. — We may attempt, by Fourier analysis or the more 
gt'iu'ral harmonitj analysis, to represent the oscillations as the sum of a number of cyclical 
compotu'nt.s. This is the classical approach. 

(r) Autoregression Equations. — If a series is constructed by the recurrence formula 

% i i ''^7-0 • • • • • • (30.2) 

wIk'ix' f is a mat.luunatical fiinction and s a “ disturbance ” function which may be a random 
variabk', then under certain conditions the generated series is of the required type. We 
shall c‘(»nsid<ir in particular the series 

'^1+2 = ~ ~ ■+ .... (30.3) 

wIk'u* a ariid h are constants and & is random. 

'fable 30.3 (Fig. 30.1) shows a series of type (6) in the simplest case where only one 
cv<-lical < 7 )mi)onent is involved, together with a random residual. Table 30.4 (Fig. 30.2) 
.siiows a,n autoregressive series constructed from random numbers by the formula 

%+2 — ^/4-i — ^ 1 + 2 - .... (30.4) 
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TABLE 30.3 


Values of the Series u^ = 10 sin h where is a Rectangular Random Variable with 

5 

Range — 5 to +5, rounded off to Nearest Unit. 


Number of 
Term. 

Series. 

Number of 
Term. 

Series. 

Number of 
Term. 

1 

1 

Series. 

i 

1 

1 

3 

21 

11 

41 

5 ! 

2 

8 

22 

13 

42 

12 

3 

6 

23 

10 

43 

7 

4 

2 

24 

6 

44 

5 

5 

- 4 

25 

- 5 

45 

3 

6 

- 7 

26 

- 8 

46 

_ 2 

7 

- 9 

27 

- 12 

47 

- 12 

8 

- 9 

28 

- 10 

48 

- 12 

9 

- 10 

29 

- 7 

49 

- 8 

10 

- 1 

30 

0 

50 

- 1 

11 

8 

31 

1 

51 

H 

12 

7 

32 

8 

52 

13 

13 

6 

33 

13 

53 

12 

14 

4 

34 

7 

54 

7 

15 

- 3 

35 

4 

55 

5 1 

16 

- 10 

36 

- 9 

56 

- 1 i 

17 

- 11 

37 

- 9 

57 

- 6 i 

18 

- 15 

38 

- 6 

58 

- 14 1 

19 

- 4 

39 

- 4 

59 

- 8 ! 

20 

4 

40 

- 2 

60 

‘ 1 



Fig. 30.1. — Graph of the Values of Table 30.3. 
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TABLE 30.4 

Values of Series = 1‘1 %j;+i — 0*5 Uf + £^ 4.2 where £^ 4.3 is a Rectangular Random 
Variable with Range — 9-5 to 9-5, rounded off to Nearest Unit. 



Fig. 30.2.— Graph of the Values of Table 30.4. 
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30.10. It is qiiite possible that theoretical reasons may suggest other schemes for 
study as the subject progresses. For instance, we might wish to consider series defined 
by (hfferential equations, on the analogy of the similar equations determining oscillations 
in physical phenomena such as vibrating strings or electrical discharges. Something has, 
in fact, already been done in this direction. We shall, however, confine our attention 
to the three schemes indicated above, and particularly the second and third. 


30.11. On the face of it, an observed series exhibiting the typical movements in 
ampHtude and period might be due to any one of the three schemes or even to a combination 
of them. We require, in the first instance, some objective criterion for deciding which of 
them is apphcable in particular cases. Inspection of the primary data, though useful, is 
quite an unreliable guide in making a decision on this point, particularly if the series 
is short. Experience seems to indicate that few things are more likely to mislead in the 
theory of oscillatory series than attempts to determine the nature of the oscillatory move- 
ment by mere contemplation of the series itself ; and yet this is the method, if one can 
dignify it by such a term, which has perhaps been most widely used in the past. 


(30..5> 


Serial Correlation 

30.12. Suppose our series of values is . . . u.y^. Let us form the product-moment 
correlation coefficient between successive terms, i.e. 

r = (%> %+i) ^ 

^ (var Uj var * • • • • 

There will be — 1) pairs entering into the correlation, and the variances of Uj and 
differ only in the fact that the first relates to the terms Ux, u^, . . . and the second 

to the terms u^, Ug, . . . u„. The coefficient is called the serial correlation, coefficient 
of the first order, or more briefly the first serial correlation.* 

More generally, let us define a coefficient of order k : 


n—k 


= COV K, %+fc) 
(var Uj var 


n 


k 






n- 




n-~k 

Z 

i=l 


Ui 


{n — ky 






(30.6> 


(30.7) 


By convention we define 


^0 


(30.8) 


30.13. In practice we often require to calculate serial correlations up to r^g and for 
long series as many as 60. The arithmetic is tedious but may be systematised so as to 
reduce labour, which arises chiefly in the determination of cross-products forming the 
covariances. 

The series of n terms is written down vertically on each of two slips of paper, the spacing 
being equal on the two slips. This can very conveniently be done on a Burroughs tabulator 
with a split keyboard, the series being recorded in duplicate and the resulting strip cut up 

* It is sometimes convenient to confine this expression to values calculated from samples, the 
corresponding values for the infinite series being termed “ autocorrelations ” and denoted by a Greek p. 



SERIAL CORRELATION 


403 


the middle. To calculate the first product-sum we pin the slips so that the first term 
on the right-hand slip is opposite the second term on the left-hand slip, and hence so that 
the ^'th term on the right is opposite to the {j + l)th on the left all the way down. For 
most series the differences of two terms which are opposite can be obtained mentally by 
subtraction, squared, and set up on an adding-machine. The sum of squares of differences 
is thus determined, and the cross-product found from the simple identity 

2 2 ; (XY) = X (X2) X (Y^) - X {X - 7)2. 

We then move the right-hand slip down one space so that the jth. term is opposite the 
(i + 2)th term on the left and repeat the process ; and so on to as many terms as may 
be required. 

In this process X {X^) and X (Y^) are required at each stage, and it is as well to deter- 
mine them by cumulative summation from the two ends of the series. X (X) and X ( 7) 
are also required. It is also convenient on occasion to reduce the series to zero mean 
approximately before beginning the analysis. 


Example 30.1 

To illustrate the arithmetic we will take a very trivial example which the reader should 
check for himself. Take the series 

— 5, — 6, ~ 2, 4, 7, 3, 1, — 5, - 1, 2. 

We set up the following scheme of tabulation for calculating serial correlations up to the 
fifth order : — 




27 (X) 

X(Y) 

27 

27 (F2) 

(from end). 



n ■— /c. 

/c. 

(from beginning 

(from end 

(from 

27 (X - 7)2. 

27 (X7). 



of series). 

of series). 

beginning). 

10 

0 

^ 2 

~ 2 

170 

170 

0 

170 

9 

1 

_ 4 

3 

166 

145 

143 

84 

8 

2 

- 3 

9 

1 05 

109 

344 

- 35 

7 

3 

2 

J J 

140 

105 

445 

- 100 

6 

4: 

1 

7 

139 

89 

380 

— 76 

5 

5 

2 

0 

130 

40 

172 

- 1 


The number n — A; is the number of pairs entering into the /cth correlation. X (X) is the 
sum of n — k terms beginning at the first term, X (Y) the corresponding sum of the last 
n — k terms, and similarly for X (A 2 ) and X ( 72 ). These are the quantities required to 
calculate the variances entering into the denominator of the /cth serial correlation. The 
quantities X (X — 7) 2 are calculated by the moving-slip method described above. 

We now calculate the correlation coefficients in the usual way, e.g. for ri 


var X = 


var 7 = 


cov (X, Y) = 



9-4815 


V'(18-247 X 16) 


-j- 0-55 ; 
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and for rs 


var X 


m _ / 2\2 

5 V 

!-(?)•. s«i« 


25-840 


var Y 


0-200 


rs = — 0-01. 

When n is large and the origin is chosen so that the mean of the whole series is approxi- 

Z{XY) 


mately zero, a sufficiently good value of r is given by 


the corrections 


{i7{X2) 

required to adjust the sums of squares and products to values about the mean being small ; 
but this approximation must be used with some care and in any case the first two or three 
serial coefficients should be worked out exactly. 


The Gorrelogram 

30 . 14 . The diagram obtained by graphing r* as ordinate against k as abscissa and 
joining the points each to the next is called a correlogram. We shall give a number of 
examples below and shall see that the form of the correlogram provides a method of dis- 
criminating between the various types of oscillatory series. 


30 . 15 . Suppose, for example, that the series is generated by a moving average of 
random elements with weights ai, a^, . . . a^. The typical term of the series is then 


Uj — Sj -1- ^2 Cj'+i . . , (30.9) 

Without loss of generality we may take E {e) = 0 and hence E (uj) = 0. Then 
E {Uj Uj^jg) = E Sj -j- -f- . . . -j- 

{*^1 + • • • + ^j+*+m-l} • 

Since 

E {Sj Sj+k) =0, k^O 

= V, say, if A; = 0 


we have 


E (UjUj^fg) — + «2 «/c+2 + • ■ • + **m) • • (30.10) 

provided that m> k. But if A: > m then 


E {UjUj^j,) =0. (30.11) 

Thus for an infinite series generated by the moving average the serial correlations vanish 
for k > m, and the correlogram from that point onwards coincides with the .r-axis. In 
particular, if the a’s are all equal to 1/m, we have 


E {Uj Uj+j,) = (m - k) ~ 

and hence 

k 

= 1 (30.12) 

m 

so that the correlogram consists of a straight line joining the point (0, 1) to {k, 0), together 
with the x-axis from the latter point onwards. 
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Example 30.2 

The weights of the Spencer 21 -point formula are 

1 1, _ 3, _ 5, _ 5, - 2, 6, 18, 33, 47, 57, 60 , . . 

ooU 

Apart from the divisor 350, which may be disregarded for present purposes, the sum of 
squares of weights is 17,542. The products (30.10) and the corresponding serial correlations 
are as follows : — 


k. 

-27 dj 


h. 

27 O/j 


0 

17,542 

1-000 

11 

- 930 

- 0-053 

1 

16,786 

0-957 

12 

- 628 

- 0-030 

2 

14,667 

0-836 

13 

- 214 

- 0-012 

3 

11,584 

0-660 

14 

- 27 

- 0-002 

4 

8,085 

0-461 

15 

50 

0-003 

5 

4,726 

0-269 

16 

59 

0-003 

6 

1,951 

0-111 

17 

40 

0-002 

7 

6 

0-000 

18 

19 

0-001 

8 

- 1,074 

- 0-061 

19 

6 

0-000 

9 

- 1,430 

- 0-082 

20 

1 

0-000 

10 

- 1,298 

— 0-074 

21 

0 

0-000 



The correlogram is shown in Eig. 30.3. From fc = 13 onwards the correlations are very 
small, and from Tc = 21 onwards they vanish completely. 
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30.16. Suppose now that the series consists of a sine term A sin dt plus eg, a random 
residual. As before, we may suppose E (u^) = 0, and hence 


E (uj =E {A sin Bj + [A sin 6 [j + h) + 
= A^ E (sin 0j sin 6 {j H- A:) } 

= — (sin Bj sin B {j k) } 

7h 

= ~ 2" {cos Bk — cos B {2j + iJ:) } 

— “^^cos^A: cos 6 (A: -(- w + 1) sin 


Thus for large n we have effectively, unless B is small, 

A^ 

E [Uj %+/c) = -- cos Bk = B cos Bk, say. 


Similarly we find 
Hence 


E {u^j) = J? + var e = C, say. 




B 

G 


cos Bk, 


fc> 0. 


. (30.13) 


. (30.14) 


. (30.15) 

. (30.16) 
. (30.17) 


In short, for an infinite cyclical series the correlogram itself is a harmonic with period 
equal to that of the original harmonic component. 


30.17. When the original series is the sum of several harmonic terms the formula 
for rj. will, in general, be the sum of harmonics, not necessarily with the same periods. 
Thus the correlogram will present a sinusoidal form which will not degenerate to the ic-axis 
after some fixed point and will not, in fact, be damped. 


30.18. Consider now the series defined by (30.3), namely 


— ClUj^i bUf -j- 

This is a difference equation which is easily solved by the usual methods.* 
solution of 


is 


'U't+2 + H = 0 . 

Uf = {A cos Bt ->r B sin Bt) 


The general 
. (30.18) 
. (30.19) 


where 


p = -v/6 


cos 6 = 


a 


(30.20) 


Here -s/b is to be taken with positive sign, and it is assumed that 4& > a^. We also assume 
that is not greater than unity. The contrary case is mathematically permissible, but 
it implies that u^ increases without hmit, which is outside the domain of our consideration. 


* See, for instance, 'MOne-Thomson, Calcultis of Finite Differences, chapter 13. 
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Consider now the series 

00 

(30-21) 

?=:0 

where is a particular solution of (30.19) such that i-®- such that 

P = ? 'p^ sin dt. .... (30.22) 

On substituting (30.21) in the original equation it will be found to provide a particular 
solution. The general solution is then 

OQ 

Uf = {A cos dt B sin 6t) + Sf-j+v • • • (30.23) 

5 = 0 

As p is not greater than unity we shall, in general, find that the first term in this expression 
is damped out of existence. If we may regard our series as having been started up 
some time prior to the point t = 0, the solution is effectively 

QO 

(30.24) 

5=0 


30.19. In this form the autoregressive scheme is seen to be a moving average of 
a component e with infinite extent and damped harmonic weights. Consider now its 
correlogram. We have 


Now 


Thus 


A 




Z sin Oj sin 6 {j + /:) } 

46 — 


== _ 1:!^ z ( cos Ok — cos 0 {2j + k) } ] 

46 — 

2p'‘ j cos Ok cos Ok — P^ cos 0 [k — 2) \ 
” 46^ 1 — 1 — 2^2 cos 20 + J 


E {Uj E {E {§j E e,+fc_3+i) } 

5 J 

QO 

= var e ^ (ij ^j+fc) • 

5=0 


. (30.25) 


. (30.26) 



var 6 


^ (if ij+k) 

5=0 


var e 



J 
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which, on substitution from (30.25), reduces to 




pi 


(1 -f- ^2) gin Q 


{sin (fc + 1) 0 — sin {k — 1) 6). 


Writing 


tan ip = 


1 -f 


tan 6, 


we find 




sin (^0 -f ip) 
sin Ip 


k>0 . 


. (30.27) 


. (30.28) 


. (30.29) 


Erom this we see that the correlogram wiU oscillate with period 2jr/0, but that, owing to 
the factor it will be damped. If h is negative the formula applies, except that | k ] 
must be used instead of k on the right-hand side of (30.29). 


30 . 20 . We thus reach the interesting conclusion that the three types of series con- 
sidered in 30 . 9 , however similar to the eye, will have distinct types of correlogram, pro- 
vided that the series are long enough for the observed correlations to approach the expected 
values for an infinite series. The correlogram of a series generated by moving averages, 
though it may oscillate as in Example 30.2, will vanish after a certain point ; that of a 
series of harmonic terms will oscillate, but will not vanish or be damped ; that of the auto- 
regressive scheme will oscillate and will not vanish, but it wiU be damped. The correlogram 
therefore offers a theoretical basis for discriminating between the three types of oscillatory 
series. 


30 . 21 . Unfortunately the series with which we have to work are very frequently 
too short to enable a decisive distinction to be made. We shall see below that divergence 
between theory and observation can be very considerable, and that sampling theory has 
not yet advanced far enough to enable us to make objective judgments in probability 
about its significance. We shall have to rely on limited experimental evidence and to 
some extent on intuitive judgment in reaching conclusions. If, therefore, the remainder 
of this chapter contains gaps in the treatment and leaves certain points undecided the 
reader will understand that the reason is ignorance rather than indifference. 

Examples of Correlograms from Observed Series 

30 . 22 . We will in the first place give the correlograms of a few of the series given 
earlier in this and the preceding chapter. 

Example BO. 3 

In Table 30.2 we gave the deviations from the trend of marriage rates for the years 
1843-1896. The first 20 serial correlations of this series are shown in Table 30.5 and the 
correlogram in Eig. 30.4. 
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TABLE 30.5 

Serial Correlations of the Marriage Data of Table 30.2. 


Order of 
Correlation 

k. 


Order of 
Correlation 

k. 

Tic- 

1 

0-563 

11 

- 0-080 

2 

- 0-089 

12 

- 0-136 

3 

- 0-498 

13 

- 0-132 

4 

- 0-631 

14 

- 0-058 

5 

- 0-467 

15 

- 0-095 

' 6 

- 0-025 

16 

- 0-126 

7 

0-353 

17 

- 0-036 

8 

0-396 

18 

0-131 

9 

0-254 

19 

0-209 

10 

0-104 

20 

0-205 

i 



Fig. 30.4. — Corrologram of Marriage Data of Table 30.2 (Table 30.5.). 


^riic corrologram is smooth and suggests the operation of an autoregressive scheme. 
':i’hore is little indication that a moving average, at least of extent less than 20, would account 
for the series, but on the other hand some damping appears to he present. 

# 

Example 30.4 

Table 30.6 shows the first 60 serial correlations of the Beveridge series of Table 30.1, 
the correlogram being given in Fig. 30.5. 
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TABLE 30.6 


Serial Correlations of the Beveridge Wheat-Price Index of Table SO.l. 


Order of 
Correlation 
Jc. 


h. 


k. 


Ic. 

r/c- 

1 

0-562 

16 

0-158 

31 

0-060 

46 

~ 0-036 

2 

0-103 

17 

0-109 

32 

- 0-008 

47 

- 0-013 

3 

- 0-075 

18 

0-002 

33 

- 0-039 

48 

0-042 

4 

- 0-092 

19 

- 0-075 

34 

0-007 

49 

0-062 

6 

- 0-082 

20 

- 0-062 

35 

0-056 

50 

0-065 

6 

- 0-136 

21 

- 0-021 

36 

0-010 

51 

0-050 

7 

- 0-211 

22 

- 0-062 

37 

- 0-004 

52 

0-009 

8 

- 0-261 

23 

- 0-088 

38 

- 0-015 

53 

— 0-027 

9 

- 0-192 

24 

- 0-084 

39 

- 0-047 

54 

— 0-053 

10 

- 0-070 

25 

- 0-076 

40 

- 0-047 

55 

- 0-073 

11 

- 0-003 

26 

- 0-091 

41 

0-008 

56 

- 0-106 

12 

- 0-015 

27 

— 0-052 

42 

0-034 

57 

— 0-084 

13 

- 0-012 

28 

- 0-032 

43 

0-065 

58 

- 0-019 

14 

0-047 

29 

- 0-012 

44 

0-099 

59 

0-003 

15 

0-101 

30 

0-059 

45 

0-009 

60 

0-010 



The correlogram. here is almost certainly damped. The oscillations persist in a most 
remarkable way, notwithstanding the diminishing amphtude, and the presumption is 
a strong one that the series is of the damped type. 
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Example 30.5 

In Table 29.8 (page 386) we gave the residuals of a sheep-population series for the 
y<3ars 1871 to 1935. Table 30.7 shows the first 30 serial correlations of this series and 
Rig. 30.6 the correlogram. Again the correlogram. is oscillatory, but the damping is not 
so clear. 


TABLE 30.7 

Serial Correlations of the Sheep Data of Table 29.8. 


Order of 
Correlation 
h. 

n- 

k. 


k. 

n- 

1 

0-595 

11 

- 0-142 

21 

- 0-381 

2 

- 0-151 

12 

~ 0-172 

22 

- 0-118 

3 

- 0-601 

13 

- 0-186 

23 

0-173 

4 

- 0-537 

14 

- 0-128 

24 

0-343 

5 

- 0-138 

15 

0-052 

25 

0-352 

6 

0-144 

16 

0-276 

26 

0-154 

7 

0-203 

17 

0-439 

27 

- 0-203 

8 

0-118 

18 

0-293 

28 

- 0-456 

9 

0-006 

19 

- 0-074 

29 

- 0-415 

10 

- 0-078 

20 

- 0-359 

30 

- 0-184 



Fig. 30.6. — Correlogram of the Sheep Population Data of Table 29.8 (Table 30.7.) 
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Significance of a Correlogram 

30.23. The foregoing examples illustrate one of the main difficulties we have to face 
in correlogram analysis. On intuitive grounds we seem to be justified in rejecting the 
scheme of moving averages as a possible scheme for the series of these examples, since the 
oscillations in the correlograms persist ; but we can no doubt find moving averages which 
will produce such correlograms, though their extents would have to be long (over 60 in 
the case of the Beveridge series) and their weights artificial. The only final test seems to 
be to ascertain such a moving average and then to examine whether it will predict further 
terms in the series if such can be observed. 

30.24. Distinction between the scheme of harmonic components and the auto- 
regressive scheme is even more difficult for short series, since the correlograms for the 
latter do not damp out according to expectation. Consider in fact an autoregressive 
scheme of the simple linear type (30.3). There will be the usual variation in length from 
peak to peak and in amplitude ; but if the section of the series is a comparatively short 
one, covering, say, four or five oscillations, the oscillations will not have time to get very 
much out of step and the serial correlations will be systematically larger than one would 
expect for an infinite series. This effect is exhibited in Table 30.8 and Eig. 30.7, which 
give the serial correlations and the correlogram for the series of Table 30.4, given by the 
formula 

'^t+2 = Id '^t “h £/+2‘ 

Here the damping factor p — ^/b = 0-7071, and by the thirtieth correlation r/^. should be 
very small, less than 0-002 in absolute magnitude. Actually it is 100 times as large. The 
mere fact that an observed correlogram for a short series fails to damp very rapidly is 
not, therefore, a very definite indication that the series is not ruled by the autoregressive 
scheme. On the contrary, failure to damp may be expected. 


30.25. We are on firmer ground when considering the significance of a correlogram 
in the sense of judging whether it can be derived from a random series. 

(a) The variance of r;;. in a random series of n terms is approximately — -r, provided 

7h /i/ 

that n is large. For 

^ {Xj Xj+k) I = E {E x] x]+k + 2i: Xj Xj+J, x„, x,^+j,}, j^m 

= — r var^ X. 

n — k 

Hence, for large samples. 


var r = 


1 var^ X 
n — k var®' x 



. (30.30) 


R. L. Anderson (1942) has recently given exact results for the significance of a serial 
correlation. 

(6) For our purposes, however, the important point is not whether a particular serial 
coefficient is significant, but whether the oscillatory character of the correlogram as a whole 
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Serial Correlations of the Artificial Series of Table 30.4. 
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Order of 
Correlation 
k. 


k. 

rk^ 

h. 


1 

0-70 

11 

- 0 05 

21 ^ 

0-05 

2 

0-29 

12 

- 0-17 

22 

- 0-12 

3 

0-01 

13 

- 0-27 

23 

- 0-28 

4 

- 017 

14 

- 0-31 

24 

- 0-43 

5 

- 0-27 

15 

- 0-30 

25 

- 0-57 

6 

- 0-25 

16 

- 0-18 

26 

- 0-56 

7 

- 013 1 

17 

0-12 

27 

- 0-26 

8 

007 

18 

0-29 

28 

0-02 

9 

012 

19 

0-33 

29 

0-17 

10 

005 

20 

0-22 

30 

0-27 



is so. Here we have to form an intuitive judgment, but it can hardly be doubted that 
the undulations in Figs. 30.4 to 30.6 are not accidental. Something exists to be explained 
as a systematic effect, though what that effect is may be more difftcult to decide. 

30.26. We shall proceed to study the autoregressive scheme and the scheme of 
cyclical components in more detail, without prejudice for the time being to the question 
as to which is the better representation in particular cases. This latter is not, in tact, 
entirely a statistical matter, and we shall return to it in 30.39. 
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The Autoregressive Scheme 

30.27. We consider in the first instance the simplified scheme of equation (30.3). 
The theoretical correlogram for a series generated by this equation is of the damped type 
given by (30.29), 

_ p* sin (kd Hr ip) 

^ sin ip ’ 

where 27i/d is the autoregressive period of the regression equation and is given 


cos B = 


a 

2 ^' 


The typical series of this kind has no “ period ” in the strict sense. The lengths from 
peak to peak or from upcross to upcross vary in the characteristic way. It appears from 
experiment (but has not, I think, been shown theoretically) that the distribution of dis- 
tances from peak to peak is of the unimodal type with a central value somewhere near 
the mean distance between peaks ; and similarly for troughs and upcrosses. In speaking 
of the “ period ” of an autoregressive series we mean the central value of one of these 
distributions. The question we have now to consider is whether this period is the same 
as the autoregressive period 2n/B of the regression equation. 


30.28. We have seen in 29.26 that the mean distance between upcrosses of the 
series generated by the moving average whose weights are is given by 2n/(l>, 

say, where 


m-1 


cos 4, = . 

i=i 

Substituting for | from (30.22) and using (30.25), we find 


2p \ cos Q . cos (9 (1 — p^) 'Y 

cos (f, = 4:b-a^-\l-p^ 1 - 2p^ cos 20 -f I 

^ [ 1 1 — cos 26 I 

46 — 1 1 — I _ 2p2 QQg 20 + p^] 

_ 2p cos 6 
1 p^ 


a 

~ TTb 

Thus the mean period as defined by upcrosses is 

whereas that for the autoregressive period of the equation is 



. (30.31) 


. (30.32) 


271 ! Sir Q cos 


(30.33) 
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30.29. The mean period between upcrosses is thus not the same as the autoregressive 
period. The two are very close for many of the values of a and b arising in practice. For 
instance, when 6 = 1 they are identical; when a = 1, 6 = 0-5 their ratio is 1-07. One 
might infer that an estimate of the period of an autoregressive scheme can be obtained 
from the correlogram, but this generalisation requires some important qualifications. 

{a) Firstly, the ratio of (30.33) to (30.32) is not necessarily close to unity for values 
of 6 in the neighbourhood of a^/4:, i.e. when 0 is small and the autoregressive period is long. 
Consider, for instance, the series generated by 


We have 


'^ i +2 — ^*^'^<+1 O ' d ' it ^ q - e ;+ 2 - 


cos 0 = 


a 

2 ^ 


1-2 

2V0-4 


0-9499 


However, for 


0 = 18-2°, period = 19-7 units. 

cos <i = = 0-8571 

1-4 


= 31°, period = 11-6 units. 

The mean distance between upcrosses, and a fortiori that between peaks, is very much 
shorter than the autoregressive period. 

(6) The mean distance between upcrosses may miss certain oscillations above or 
below the a:-axis, so that it overestimates the period between peaks or troughs. On the 
other hand, the latter may include ripples on the main wave which we wish to ignore. 
The reader can verify for himself, by constructing an autoregressive series by some such 
formula as the above, how difficult it is to draw the line in particular cases. The difficulty, 
however, must be faced, for it is precisely the kind which we meet in dealing with observed 
series. 

(c) Owing to the appearance of the phase angle in equation (30.29) the starting- 
point of the correlogram {k == 0) is not to be regarded as a maximum. The period of the 
correlogram is therefore to be calculated either by ignoring this point or by reference to 
distances between troughs and upcrosses in the correlogram. 


30.30. The equation 

may be regarded as expressing the regression of on and u^, the term £^4.3 being 
a residual error. We may therefore estimate the constants a and 6 from the regression 
equation of the observed series in the usual way. If we assume that the series is long enough 
for end effects to be negligible in determining the variances of the finite series, then 
var = var%4.i = var and from the usual formulae for regressions we find 


a — 


ri (1 — r.,) 

1 — rf 


. (30.34) 


6 = 



— 1 + 


1 

1 


— ^2 


r\ 


. (30.35) 


This gives us the constants of the autoregressive scheme from the serial correlations. 

It should, however, be realised that these estimates are rather sensitive to superposed 
error of the type we refer to below (30.32), and it is therefore unsafe to estimate the 
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autoregressive period from them. The correlogram itself appears to be a safer guide on 
this matter. 


Example 30.6 

Consider again the sheep data of Table 30.7 and Eig. 30.6. Suppose we have decided, 
from the appearance of the correlogram, to attempt to represent the series by an auto- 
regressive scheme. 

In the jfirst place, we have to inquire whether a scheme of the simple linear form (30.3) 
is likely to be adequate. Would it, for example, be better to consider the more general form 

or need we take into account curvilinear regressions such as 

'^t+2 +■ ^f+1 + -{- b' -j- £^^2 • 

The first point can be elucidated by the use of partial and multiple correlations. The 
following are the partial coef&cients and the function of the multiple correlation 1 — 
as determined by the continued product of (1 — r^) (cf. vol. I, equation 15.45, 
p. 380) 


Order of Partial 
Correlation. 

Value of Partial 
Correlation. 

77 (1 - r2). 

12 

0-595 

0-6460 

13.2 

- 0-782 

0-2509 

14.23 

0097 

0-2485 

15.234 

- 0-183 

0-2402 

16.2345 

0031 

0-2400 

17.23456 

0-014 

0-2400 


Evidently no appreciable gain in representation is to be obtained by taking the regression 
on more than the two preceding terms. 

The possibility as to better representation by taking curvilinear regressions may be 
considered by drawing the scatter diagrams of on Ut+i and Ui on These are 

shown in Fig. 30.8. It seems clear that there is an essential scatter in the data which no 
ordinary polynomial can represent, and that curvilinear terms are unlikely to add anyt,hing 
material to the linear regressions. 

We conclude that if the data are of the autoregressive type it is unnecessary to con- 
sider any more elaborate scheme than the simple type 

%-|-2 + 

For this series we have 


Hence 


= 0-595, ra = — 0-151. 


a 


(1 — ^2 ) 

1 - rf 


= 1-060 


6 = 


r^-l 


-f 1 


0-782. 
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The autoregression equation is 

%+2 = — 0-782 Ut H- £<+2- 

For the autoregressive period we have 

cos d = - = 0-600, e = 53-2° 

2V(0-782j 

and hence the period is = 6-8 years. 

Now in the correlogram (Fig. 30.6) there are peaks at Z; = 7, 17 and 25, giving a period 
of about 9 years ; and there are troughs at& = 3, 13, 21 and 28, giving a mean period 
of 8-3 years. The autoregressive period as estimated from the correlogram is then between 
8 and 9 years, whereas that given by the autoregression equation is 6-8 years, considerably 
shorter. 

Using the values of a and h found above, we have for the mean distance between 
upcrosses, 

cos 6 = = 0-5948, <f> = 53-5°, 

giving a mean distance practically equal to the autoregressive period as shown by the 
regression equation. 

Finally, looking to the original series, we see that there are nine major peaks, the 

58 

first in 1874 and the last in 1932, so that the mean distance between peaks is — = 7-25 

O 

years ; and nine upcrosses, the first between 1872 and 1873 and the last between 1930 and 

1931, so that the mean distance between upcrosses is — = 7-25 years, the same as for peaks. 

The upcross at 1876-7, however, is due to a temporary fall below the zero line, and had it 
not occurred we should have found a mean distance of 8-3 years. 

We have therefore reached this position : the mean period in the series itself appears 
to be about 7-25 years ; that given by the regression constants is 6-8 years ; and that given 
by the correlogram is about 8-5 years. These figures are scarcely close enough for comfort, 
and further data would be required to arrive at a more accurate estimate of the mean 
period. Nevertheless, they illustrate very well the kind of divergence which appears to 
be more the rule than the exception in dealing with short series. We should expect the 
correlogram to give a higher value than the series itself, for there may appear peaks or 
upcrosses in the latter which are purely temporary fluctuations due to the casual element. 
On the other hand, the regression constants appear to give consistently lower values for 
the autoregressive period than the correlogram, an effect found by Yule (1927a) for sunspots. 
Wold (1938a) for cost-of-living indices, and Kendall (1944a) in series of agricultural prices, 
acreage and livestock populations. 

30.31. Let us examine more closely the effect referred to at the end of the previous 
example. Our autoregressive system is based on a random element which is added to 
the term We can therefore regard the value at time i -f- 2 as composed of two parts, 

a systematic element expressed by au^+j -f bUf, giving the effect of the past history of the 
system at times i -ff 1 and t, together with a new random element peculiar to the moment. 
This latter is random in the sense that it is casual and unpredictable ; but once it has 
occurred it is incorporated into the motion of the system and exerts an influence on future 
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history. It is therefor© quite unlike an error of observation or a sampling error whicb 
distorts the value of a particular member but does not affect the others. 

Now suppose that such an error of observation is present, and let us represent it by 
rj. Eor long series this element will increase the variance of the observed values by var rj, 
but if it is independent of the remaining constituents of the series it will not affect the 
covariances. Hence the serial correlations will all be reduced in a constant proportion c, 
except of course Vo ; and this, as we proceed to show, will affect the autoregressive period 
as derived from the regression constants, in general shortening the period quite considerably. 


30.32. If is reduced to cr^ and rg to cr^, the constants of the regression equations 
are, from (30.34) and (30.35), 

c^i (1 — era) 


a 


b' 


cr. 


rf 


c® r? 


The estimated autoregressive period is then 6', given by 


cos 6' 


a 


CTi (1 


era) 


2-v/(l — ^i) (c‘^ rf — era) 

Differentiating the logarithm of this expression and putting c = 1, we fin d 


2 tan 6 


dc 


2rn 


2rf 


+ 


,. 2 ’ 


which reduces to 


, d6' _ (1 + 6) ( 3 fe 2 + 2, _ ^2) 


Now tan 6 


( 


dc 2b {(1 + by 

and the period P = 27t/0. We then find 


dc ), 


(30.36) 

(30.37) 


(30.38) 


(30.39) 


_ P'^a (1 +6) (362 2) _ a2) 

“■ 4^6 { (1 -I- 6)2 _ a;^}^{4.b - a.2)- 

This equation gives us an approximate idea of the change in the period P for small 
changes in c near c = 1. For instance, with a — 1-b, b ~ 0-9 we find P = 9-7 units, 

and from (30.39), 

= ~ 16-5. 

\ dc /c=i 

Thus, if c = 0-9, i.e, the variance of rj is about 10 per cent, of the total, the period will be 
reduced by about 1*65 years, a substantial amount. 


30.33. It is thus possible that the observed discrepancies between the autoregressive 
periods as given by the regression constants and the correlogram may be due to superposed 
random fluctuation which is not incorporated into the autoregressive scheme. This is 
not the only possible explanation ; for instance, in particular cases the disturbance function 
e may not be random. The hypotheses to be considered in such a case, however, are so 
complex that it is difficult to pursue a quantitative investigation without a wealth of 
material ; and this, unfortunately, is usually denied to us, at least in economic work. 
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Meteorological data are more numerous, and we may hope that further light will be thrown 
on the autoregressive scheme by a re-examination of the material available in this field. 

30 . 34 . Consider now the more extended autoregression equation 

'“'f+m + %+m-2 + • • • + “ ®«+m- • ' (^t).40) 

The explicit solution cannot be given in the simple form available when m f= 2. It has, 
in general, the solution 

= Ml ctj -j- A\ 0 C 2 d" • • • "h oc^ -|“ JB, . . - (30.41) 

where oci ... are the roots of 

+ cti -}- (*2 -h • • • -h ~ • • • ( 30 . 42 ) 

and R is a particular integral involving the e’s. For the series to be oscillatory without 
increasing indefinitely no term such as a;V where cc is real and greater than unity, can appear. 
Assuming this to be so, and assuming further that the series was “ started up ” some time 
before t = 0, we reduce the solution to the particular integral B. 

m 

Choose a particular value if of ^ Aj xj, such that 

3=1 

io=0^ 
ii +aiio = 1 

^2 4” "4” ^^2 I’d ~ 0 /. * . • (30,43) 

4 * • « 

1 d~ ^rn,— 2 d~ ®m— 1 ~ b- ^ 

This is always possible in general, for it imposes m conditions on the m constants A. Then 
it will be found on substitution that a particular integral B is given by 

oo 

..... (30.44) 

i=o 

a generahsation of (30.24). Our series may then be regarded as generated by a moving 
average of infinite extent, the weights being combinations of damped harmonic and 
exponential terms. 

30 . 35 . The correlogram of such a series may be determined by the following method, 
due to Walker (1931). Multiply (30.40) by and sum. We find 

^k+m + rk+m -1 4- + . . . q. a. r* = , (30.45) 

var u 

Now Uf_j^ depends only on Sf_j^ and terms with lower subscripts and hence is uncorrelated 
with Ef^^ for k> — m. Thus we have 

‘’"k+m + ^fc+m -1 + • • • + Tc> — m. . . (30.46) 

If we multiply (30.40) by we find similarly 

rk + ^fc+i + . . . ^ _ (30.47) 

var Ub ^ ^ 

but the expression on the right no longer vanishes. In fact contains the term 

4+1 Si+m! hence 

var s 

I'k 4- Tji+i 4- ... 4- rk+m = 4+i . k > —m. . (30.48) 

var u ' ' 
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From (30.46) it follows that the serial correlation will be given by 

rfc = A" {Aj oif), ..... (30.49) 

where the a’s are the roots of (30.42) and the A’s are constants to be determined from initial 
conditions. Thus the correlogram will be the sum of terms which either decay exponentially 
to zero (a real) or oscillate with a similar decay to zero (a complex). Walker (1931) has 
used this result in an inquiry into a series of atmospheric pressures. 


The Autocorrelation Function 


to 


30 .36 . If we have a series u (t) defined at every point of time in some range — h 
+ h, we may define its variance as 



. (30.50) 


on the assumption that the mean value is zero, which does not hmit our generahty. Sup- 
pose the series is reduced to standard measure by dividing throughout by the square root 
of this variance. Then an evident generalisation of the serial correlation is given by 

r (A:) = ^ (* u (t) u {t k) dt. . . . . (30.51) 

J -h 

We shall call this the autocorrelation function. We can likewise regard it as defined when 
h tends to infinity, provided that the limit on the right in (30.51) exists. It is to be noted 
that r (k) is in that case an even function of k. 


30.37. We shall also consider the function 

*00 


E {k) = r u (t) u (^ -f k) dt, 
J -CO 


. (30.52) 




21 (t) 21 {t + k) dt dk 


when it exists. We have 

f R (k) dk 

J —00 

pQO 

= I I e'f^ u {t 1- k) u {t) dt dk. 

J ~oo J —00 

The simple substitution t k = q reduces this to 


Thus, if we write 


we have 


1 e"^^^ u (q) dq 1 u {t) dt. 

J —CO J ^00 

poo 

a (2)) + ifi ip) — I u (q) dq, 

J —00 

pco 

E (k) dk = {p) + {p). 

J »-00 


. (30.53) 


. (30.54) 


It follows, as is otherwise evident from the fact that R (k) is an even function, that the 
imaginary part on the left of (30.54) vanishes, and we have 


1 E (k) cos kp dk = (p) + {p). 

J — QO 


. (30.55) 
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If, following the notation of characteristic functions, we write (p) for the integral on 
the left in (30.54) and (p) for that on the right in (30.53), we have 

<f>R{p) = \i>u(p)\^ (30.56) 

We may then put ’^uiP) — ■ (30.57) 

where p is an arbitrary real function. We shall then have 

W (t) = — I ip) ^P 


= -Lf~ 

2jr J 


V^B exp {ip - Up) dp. 


(30.58) 


Since u (i) must be real, the imaginary part vanishes and this is equivalent to 

1 r“ _ 

'fJt>{t) = J V<l>B eos ip — tp) dp, . . . . (30.59) 

and p must be an odd function of p. The result is due to Wiener (1930). It shows that 
the autocorrelation function B does not uniquely determine u (t) because of the arbitrary 
function p. 


30.38. Consider now the autocorrelation function r {k) as defined in (30.51). Let 
us regard the series as defined but equal to zero outside the range — h to -|- h. 
Then we have 

2hr {k) = { u {t) u {f k)dt — { u (t) u {t + k) dt = B {k), . (30.60) 

J -A J -00 


where B and r are zero outside the range — 2h to 2h. The foregoing results then con- 
tinue to hold with some modifications concerning factors in 2. If we write — 

1 1 r“’ 

(P) — r 1 ^ dk = —j— I B (k) dk . . (30.61) 

^J-2A 

and {p) = u (t) dt=l-{ u (t) dt, . . . (30.62) 

^ J -h ^ J -00 


then corresponding to (30.56) we have 


^^riP) = \kiP)\^ 


(30.63) 


We may now let h tend to infinity and observe that the results continue to hold ijinder 
certain general conditions, provided that the limits exist. 


Example 30.7 

Consider the series 

u {t) = sin (Ax t -)- ocx) sin (A 2 1 -J- 0 C 2 ) sin {K^ t otjjj). 

For the variance we have 

1 1 I 

^ J ^ J ^ ^ 

since the cross-product terms will contribute only a finite amount to the mtegral and hence 
vanish in the limit. 
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= |2'[A|{1 -cos2(A,.i + a^) 

Similarly for u (t) at (i + Jc) we have 

lim j* \2j sin t ctj) ]■ ] [X" {A^ sin (A^ t -{- kj k -{- <x,j) 

— lim ^ r ^27 {A| [cos hh — cos {A,- {2t + h) + Sa,-} ] } dt 
J -A 

= A] cos A^ k. 

Thus r {h) = g {-^1 cos ^ 

The correlogram is the sum of a series of harmonics, like the original series, but the 
coefficients are different and the harmonics are all in phase. 

30.39. The idea underlying the autoregressive scheme of representing time-series 
may perhaps be best illustrated by an analogy. Imagine a motor-car proceeding along 
a horizontal road with an irregular surface. The car is fitted with springs which permit 
it to oscillate to some extent but are designed to damp out the oscillations as soon as the 
comfort of the passengers will permit. If the car strikes a bump or a pothole in the road 
the body will oscillate up and down for a time but will soon come to rest so far as vertical 
motion is concerned. If, however, it proceeds over a continual succession of bumps there 
will be continual oscillation of varying amplitude and distance between peaks. The oscilla- 
tions are continually renewed by disturbances, though the distribution of the latter along 
the road may be quite random. The regularity of the motion is determined by the internal 
structure of tlie ciar ; but the existence of the motion is determined by external impulses. 

30.40. It appears to me very plausible to suppose that oscillations in time-series 
are generated in this way. One does not have to postulate some external rhythmic influence 
which keeps the oscillation going, or to suppose that the system will oscillate without 
damping once it has been set in motion. Nor is it necessary to assume that the majority 
of the deviations between theory and obsca-vation are due to “ errors ” which exert no 
effect on the subsequent movement of the systetn. The reader, however, will have to 
form his own opinion on this matter.* We now proceed to examine an alternative scheme 
of representation in which the series is represented as ai sxim of (undamped) cyclic terms. 

Periodog ram, Analysis 

30.41. It is well known that under certain general conditions a function / (i?) can be 
expanded in the Fourier series, valid in a certain range, 

, Tit , 27lt , ^TCt , 

J (t) = do +■ cos - — !- «'2 cos b as cos 1- . - - 

Ai A-i Ai 

+ ^0 + sin ^4-^2 sin -f- bz sin 4- . . . . . (30.64) 

Ai Ai Ai 

* The schomo considered in this chapter may over-simplify natural conditions in that it assumes 
finite random disturbances at equidistant time-intei-vals. If the intervals are not equal, or if the dis- 
turbances are small and continually occurring, the autoregressive scheme is only an approximation. 
Much remains to be done on this subject. 
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Eunctions which are not periodic can be expanded in this way ; for instance, in the 
range 0 < a; < tc, 

1 1 ^ 

- = sin a: — - sin 2a; -h t: sin 3a: — sin 4a; -f . . . 

2 2 3 4 

The function of course, repeats itself in the range tt <C a; < 27r, and so on. 

As a representation of observed series the Fourier series is rather restricted in scope, 

since the period of every term is a multiple of the fundamental period 2Ai. A more general 

scheme is provided by the series 

„ , , 27tt , 27tt 

f (t) = Uo + Ui cos ^ 1- Ua cos h . . . 


, , , , . 2ar^ , , . 27it , 

-f 6o + 6i sin 1- 6a sm -T h • - • 

Ax /a 

or the alternative form 


f{t) = Ao + Ax cos 



-h Aa COS 



(30.65) 


(30.66) 


Here the .^’s are not necessarily commensurable. The object of our analysis is first of all 
to find out what are the best values of the A’s to select, and secondly to evaluate the other 
constants a and 6, or A and a. 


30.42. Suppose we wish to test whether a time-series contains a harmonic term with 
period /z. Consider the series 


. 2 2m 

A = - > % cos — 

3 = 1 


B 


O 

2 


Uj sin 


27tj 


and write 


3 = 1 


fZ 


(30.67)* 


. (30.68) 


2 
71 




Suppose that the series is in fact given by 

. 27lj 




a sin 


4* 


. (30.69) 


. (30.70) 


where 6^ is a component which we will assume to contain no cyclical element, so that its 
correlation with the other component is zero, at least for long series. Then we have 


* Some writers define these sums with j from 0 to n - 1. The signs of A and B may then differ 
iTom those given by (30,67) and (30.68), but the intensity and phase are unaffected. 
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and the second term may he neglected. Thus, writing 


oc 


2,71 

T’ 




27t 

3 

[A, 


we have 


A 


2a 


n 


Z (sin aj cos ^j) 


a 

n 


V f 


sin (a ~ P)j + sin (a + ^)j} 


I i sin I (g— sin (a+jS) n sin ^ (a+/S) (n+l) \ 

\ sin ^ (a— sin ^ (a-|-/S) J 


. (30.71) 


Foi* large n this remains small unless a approaches /5 (or — which is essentially the same 
situation), and in that case we have 


A ~ a sin (a — {n 1). 
Similarly, B a cos (a — /3) (^^ + 1), 

so tliat = A2 + R2 = . 


(30.72) 


'’I'liuH remains small unless the “ trial ” period [x approaches the real period 1, and in that 
(':a.s(^ e!(|uals the amplitude a. 


30.43. Similarly we may exjject that if the series consists of a sum of harmonics 
with periods A,, Ao, • • • B will be small, unless is equal to one of these periods, in 
whicli case it is finite and equal to the amplitude of the term concerned. 

This result forms the basis of what is known as periodogram analysis. We select 
a number of trial periods foi* different values of ii and calculate 8^ for each of them. 8^, 
wliich is called the intensity, is then exhibited as a function of y, and graphed as ordinate 
against y. as abscissa. The diagram oV)tained by Joining the points, each to the next, is 
called the periodogram. If this figure has peaks at certain values Ai . . . and we are 
prepared to assume that these are not sampling accidents, the values are the appropriate 
])(n’iods of harmonic terms and the intensity 8'^ provides the corresponding amplitudes. 
The (piantities A and B of (30.67) and (30.68) are obtained incidentally and provide the 
j)hase angles a of (30.66). We shall illustrate the arithmetic processes below. 

30.44. Fig. 30.9 shows the periodogram of the wheat-price index data of Table 30.1. 
1 n order not to c'onfuse the diagram for lower values of the trial period we have shown 
only the major fluctuations. The length of the series was about 300 years from 1545 to 
1844, (earlier and later figures shown in Table 30.1 not having been taken into account. 
The primary data have been taken from Sir William Beveridge’s classical paper (1922) and 
are shown in Table 30.9. For practical reasons which will emerge presently, certain trial 
periods arc taken not over exactly 300 years but over the number N of years shown in 
t.lu^ table. To reduce the figures to comparability, Beveridge therefore multiplied the 

sum A'^ f B- by 
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TABLE 30.9 

Periodogram Analysis of the Beveridge Wheat-Price Index Data of Table 30.1. 

(From J.R.S.S., 1922, 85, 412.) 

The first observation relates to 1545, except where A and B are given in heavy tj^e. 


r\ri 

Number 

A. 


Intensity 

Period 

(Years). 

Number 

A. 


Intensity 

JL, Ai 

(Years). 

ol Years 
A'. 

B. 

N (A^ + 52) 
300 

of Years 

N. 

B. 

N (J.2 52) 

300 

2-000 

300 

-f 0-11 


0-01 

2-667 

312 

- 0-92 

+ 1*20 

2-38 

2-049 

336 

- 0 40 

- 0 09 

0-19 

2-687 

301 

+ 1-23 

~ 0-02 

1-52 

2-054 

304 

-+- 0-48 

- 0-72 

0-77 

2-692 

315 

- 0-04 

+ 0-23 

0-06 

2-061 

340 

+ 0-38 

- 0-57 

0-54 

2-706 

322 

- 0-27 

+ 1-33 

1-97 

2-069 

300 

-1- 0-25 

-f 0-63 

0-46 

2-714 

304 

+ 0-83 

+ M7 

2-10 

2-074 

336 

- 0 61 

+ 0-51 

0-71 

2-727 

300 

-+- 0-86 

+ 1-46 

2-87 

2-080 

312 

+ 0-92 

~ 0-50 

1-14 

2-733 

287 

•f 2-06 

1*.19 

6-16 

2-087 

288 

- 0-52 

- 0-11 

0-27 

2-736 

279 

+ 2-44 

+ 1-23 

7-82 

2-095 

308 

~ 0-91 

”1" 0*90 

1-69 

2-737 

312 

+ 2-23 

+ 1-00 

6-22 

2-105 

320 

-4 0-90 

+ 0-07 

0-86 

2-741 

296 

-1- 2-43 

+ 0-25 

6-86 

2-112 

288 

+ 0-90 

-4- 0-80 

1-38 

2-750 

308 

-+- 0-90 

- 0-84 

1-55 

2-133 

320 

+ 0-89 

0-15 

0-84 

2-762 

348 

- 0-57 

~ 0-04 

0-37 

2-154 

308 

-t- 0-48 

- {*” 0’l23 

0-29 

2-769 

324 

+ 1-49 

+ 0-23 

2-28 

2-182 

288 

+ 1-32 

- 0-59 

1-99 

2-778 

325 

-1- 1-20 

- 0-92 

2-48 

2-200 

308 

- 0-13 

™ o-bo 

0-39 

2-800 

336 

- 1-01 

- 0 19 

1-18 

2-222 

320 

- 0-32 

- 0-62 

()-.52 

2-818 

310 

■f 0-55 

+ 1-07 

1-49 

2-261 

312 

■1 0*50 

- 0-22 

0-31 

2-833 

323 

+ 0-78 

- 0-10 

0-67 

2-286 

320 

.... ().3s 

- 0-85 

0-93 

2 -.846 

296 

+ 0-41 

H- 0-42 

0-34 

2-316 

308 

+ 1-39 

- 1-05 

3-11 

2-857 

320 

■h 0-96 

■f 0-21 

1-03 

2-333 

308 

- 0-10 

-- 0-25 

0*08 

2-875 

322 

■h 0-35 

-f 0-14 

0-15 

2.353 

:i2o 

t 0‘<)0 

1 0-07 

0-86 

2-888 

312 

-1 1-51 

+ 0-21) 

2-43 

2.3(5-l 

312 

-- 0-12 

-- 0-63 

0-43 

2-895 

330 

- 0-69 

- 1-57 

3-21 

2-370 

320 

-1 0-05 

-- 0-28 

0-08 

2-909 

320 

H- 0-70 

- 1-11 

1-84 

2-375 

30-1 

1 0-29 

0-43 

: 0-27 

2-933 

308 

-- 0-04 

+ 0-39 

0-16 

2-381 

300 

0-19 

1-22 

1 -53 

2-947 

336 

0 93 

~ 1-19 

2-57 

2-385 

310 

- 1-00 

0-89 

1 -86 

2-960 

i 296 

- 0-00 

- 1-15 

1-30 

2-391 

330 

1-30 

0-54 

2-18 

3-000 

i 300 

- 0-29 

-- 0-39 

0-23 

2-395 

309 

0-72 

1 0-60 

0-90 

3-040 

304 

1 0-09 

-I- 0-75 

0-58 

2-400 

i 312 

1 0-34 

1- 0-68 

1 0-60 

3-077 

320 

■1 0-05 

1 1-18 

l-,50 

2-412 

328 

0-08 

0-65 

0-47 

3- 1 1 1 

' 336 

1 0-91 

0-44 

1-15 

2-417 

3-18 

i 1 0-63 

1- 0-57 

0-69 

3-143 

308 

1 2-01 

h 0-23 

4-20 

2-435 

336 

1- 0-44 

1 0-01 

0-22 

3-167 

30-1 

1 0-46 

- 1-05 

1-33 

2-452 

304 

1-10 

0-51 

2-23 

3-200 

320 

•f 0-43 

-I- 0-95 

1-16 

2-462 

320 

0-25 

1 1-19 

2-44 

3-217 

296 

-1- 1-25 

+ 0-00 

1-55 

2-476 

312 

- 0-38 

1- 0-35 

0-27 

3-250 

312 

... J .22 

•- 0-47 

1-80 

2-483 

288 

0-07 

1 0-74 

0-53 

3-273 

324 

■ - 0-55 

-1- 1-18 

1-82 

2 -.500 

320 

- 0-24 

1 1-19 

1 -56 

3-286 

322 

0-1 1 

1- 0-99 

1-07 

2-512 

.324 

I 1 0-86 

1 0-39 

0-97 

3-304 

304 

1- 0-13 

■h 0-76 

0-59 

2-516 

312 

! 1 0-45 

-1- 0-2-1 

0-26 

3-333 

320 

I- 0-90 

+ 1-58 

3-64 

2-529 

1 301 

0-19 

- 0-31 

0-13 

3-364 

i 296 

-1- 1-76 

+ 0-98 

4-00 

2-545 

.336 

1 39 

0-81 

2-89 

3-375 

324 

h 0-56 

+ 0-92 

1-24 

2-.555 

322 

' 1 0-38 

■f 0-50 

0-12 

3-385 

308 

-h 0-35 

-h 1-03 

1-21 

2 -.571 

306 

■f 1-25 

0-55 

1-91 

3-400 

323 

1- M2 

h 3-37 

7-41 

2-588 

i .308 

i 1 0-30 

1- 0-43 

0-28 

3-407 

1 276 

i 2-98 

+ 2-81 

14-90 

2-600 

i 312 

1- 1-02 

- 0-39 

i 1 -25 

3-412 

348 

-I- 1-27 

- . 3-98 

; 15-.53 

2-615 

306 

1 - 0-75 

0-24 

i 0-63 

3-417 

328 

-1- 3-08 

- 2-24 

15-84 

2-625 

i 294 

j -- 0-45 

1 1-36 

! 2-01 

3-429 

288 

-h 3-11 

- 1-40 

11-16 

2-643 

i 296 

i + 0-95 

i 

j 

0-62 

1 1-27 

1 

3-444 

310 

-1- 0-09 

... 0-99 

1-03 
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time-series 

TABLE 30.9 — continued. 


^ J I* 

lumber 

I 



Period _ , 

r-T- > 0. 

Years 

A . 

B . 


fears). 

N . 


r 

1 


3-455 

304 

+ 0-56 

+ 0-29 * 

3-462 

315 

+ 1-57 

+ 1-02 


3-500 

308 

+ 1-20 

- 0-94 


3-524 

296 

_1_ 1-41 

- 1-18 


3-538 

322 

+ 0-50 

- 1-45 


3-556 

320 

+ 0-02 

- 0-43 


3-571 

325 

+ 0-80 

- 0-69 


3-600 

324 

- 1-03 

+ 0-82 


3-619 

304 

+ 1-18 

+ 1-23 


3-636 

320 

+ 1-14 

+ 0-13 


3-643 

306 

- 0-16 

+ 0-27 


3-667 

308 

- 2-14 

- 1-07 


3-679 

309 

+ 0-34 

- 1-90 


3-692 

288 

+ 1-28 

- 0-22 


3-700 

296 

+ 0-90 

- 0-59 


3-714 

312 

+ 1-15 

+ 1-78 


3-727 

287 

- 0-45 

- 1-65 


3-760 

315 

+ 0-64 

- 0-06 


3-778 

306 

- 1-17 

- 0-68 


3-800 

304 

+ 1-60 

+ 0-80 


3-833 

322 

- 1-12 

- 1-63 


3-857 

324 

+ 1-63 

+ 0-45 


3-888 

280 

- 0-15 

+ 0-66 


3-895 

296 

- 0-66 

-f 1-00 


3-923 

306 

+ 0-64 

- 1-61 


3-962 

309 

- 0-67 

+ 1-74 


4-000 

300 

+ 1-47 

- 1-13 


4-077 

318 

+ 0-57 

- 0-26 


4-111 

296 

+ 1-13 

- 1-70 


4-143 

290 

- 0-50 

+ 0-23 


4-167 

325 

+ 1-21 

+ 0-32 


4-173 

322 

+ 0-66 

- 1-46 


4-200 

294 

- 0-99 

- 0-41 


4-250 

323 

+ 0-50 

- 2-73 


4-286 

300 

- 0-65 

+ 0-79 


4-333 

312 

- 1-50 

- 1-30 


4-353 

296 

~ 2-85 

- 0-24 


4-364 

288 

- 2-98 

+ 0-75 


4-375 

315 

- 2-47 

+ 0-87 


4-385 

342 

- 0 - 5 ( 

) 2-55 

► 

4-400 

308 

- 1-38 

+ 3-27 


4-412 

300 

+ 0-08 

+ 3-62 


4-417 

318 

+ 0-87 

’ + 3-85 


4-429 

310 

+ 1-80 

f “|- 2*41 


4-444 

320 

"1" 2*15 

i + 0-83 


4-471 

304 

+ 0-91 

. -f 0-79 

1 

4-500 

306 

+ 1-87 

' + 0-72 

; 1 

4-571 

320 

- 0-21 

-f 0-04 1 

4-600 

322 

- O-OJ 

+ 1-24 


4-667 

336 

+ 0 - 1 ' 

9 + 0 - 9 ; 


4-750 

304 

- 0 -li 

+ 2 - 2 E 


4-800 

288 

+ 2 - 4 ^ 

+ 1-08 


4-857 

306 

- l - 0 ( 

- l - 3 ( 


4-888 

312 

- l - 8 < 

+ 2 - 1 ] 



Intensity . p • ^ 
300 ' ' 


0 - 39 

4 - 87 

2 - 38 

3 - 31 

2-53 
0-20 

1 - 21 
1-88 

2 - 94 

1-39 
0-10 

5 - 87 

3 - 83 

1 - 63 
M 8 

4 - 65 

2 - 72 

0 - 44 

1 - 86 

3 - 24 
417 

3-08 

0 - 43 

1 - 42 
3-06 
3-59 

3 - 64 
0-41 
413 

0 - 30 

1 - 70 

2 - 77 
1-02 
8-32 
104 

4 - 10 

8 - 05 

9 - 07 
719 

7 - 72 
12-89 
1311 
16-48 

9-32 

5 - 66 
1-48 

4 - 09 
0-22 

1 - 65 
1-00 

5 - 28 

6 - 84 

2 - 89 

8 - 00 


4 - 933 

5 - 000 

5-067 

5 - 091 

5-100 

6 - 111 

6-126 

6-143 

6-200 

6-250 

5-333 

5-400 

5-415 

5 - 429 

6 - 455 

5-500 
5-655 
5-600 
5-667 
5-692 
5-714 

5 - 760 

6 - 800 
5-846 

5 - 933 

6 - 000 

6-111 

6-143 

6-167 

6-200 
6-250 
6-286 
6-333 
6-400 
6-500 
6-671 
6-667 
6-727 
6-760 
6-800 
6-909 

6 - 933 

7 - 000 

7-143 

7-200 

7-333 

7-400 

7-417 

7-429 

7-500 

7-600 

7-667 

7-750 

7-857 


1 

! 

umbel * 

1 


Intensity 

Years 

A . 

B . 

N { A ^~ + B ^) 

N . 

i 

1 



~~ 300 

296 I 

+ 1-57 

4 - 1-58 1 

4-91 

300 1 

+ 1-85 

4- 1 - 00 ' 

4-30 

304 

- 0-05 

+ 3-98 

16-09 

336 

- 0 73 

+ 5-55 

35-05 

306 

+ 5-71 

4 - 2-98 

42-34 

322 

+ 5-70 

+ 0-29 

34-91 

328 

+ 3 97 

4- 2 90 

26-38 

324 

“ 1 " 

+ 2-46 

13-09 

312 

+ 0-02 

+ 0-30 

0-10 

294 

+ 1-74 

+ 1-92 

6-56 

320 

+ 0-71 

- 4-46 

21-72 

324 

+ 1-04 

+ 3-71 

16-06 

325 

+ 4-27 

+ 1-90 

23-66 

304 

+ 4-72 

- 0-28 

22-61 

300 

+ 1-37 

- 3-73 

15-76 

308 

- 1-04 

+ 1-49 

3-39 

300 

+ 2-40 

- 0-68 

6-23 

336 

+ 0-46 

4 1 21 

1-88 

306 

+ 5-31 

- 1-97 

32-72 

296 

+ 2-05 

- 3-91 

19-18 

320 

+ 0-35 

- 2-13 

4-97 

322 

+ 1-39 

- 0-33 

2-18 

290 

+ 3-55 

- 2-75 

19-17 

304 

+ 0-00 

- 2-29 

5-35 

356 

4 - 4-37 

4 0-91 

1 23-63 

300 

- 3-50 

- • 0-12 

1 12-29 

330 

- 0-79 

- 1-90 

ii 4-66 

301 

4 - 0-74 

- 2-96 

9-32 

296 

- 0-22 

-- 2-94 

8-56 

310 

- 2-02 

- 3-38 

16-02 

325 

- 3-23 

0-11 

1 1-30 

308 

- ]-72 

0-59 

1 3-11 

304 

- 1-52 

1 1-29 

4-02 

1 

320 

+ O-HO 

4 2-74 

8-71 

312 

+ 0-69 

^ 0-73 

0-94 ! 

322 

+ 1-49 

- ■ 0-77 

, 1 

320 

4 - 0-25 

4 0-21 

: o-ii 

! 

296 

+ 0-08 

-- 0-13 

0-02 i 

324 

- 0-20 

-- 1-66 

3-01 i 

306 

+ 0-23 

-- 0-65 

0-48 ' 

304 

4 - 0-58 

4 2-56 

; 7-00 ' 

312 

+ 1-68 

1 2-01 

7-15 

308 

4- 3-10 

- 2-17 

14 - 7-1 

300 

+ 1-83 

l - 8 (i 

, 6-79 

324 

+ 0-54 

3-93 

; 16-96 

308 

4 - ] -52 

i 2-81 

10-16 

296 

- 2-33 

1 - 2-72 12-65 

356 

4 1 50 4 01 21-72 

312 

‘ - 3-80 

1-49 17-28 

315 

; 4 - 0-17 

1 1-50 2-40 

304 

i - 2-33 

- 1-37 7-43 

322 

! - 1-46 ! -- 2-61 

9-57 

310 

! + 1-38 

- 0-39 2-13 

330 

j ~ 0 - 5 ( 

9 1 0-28 0-36 
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Period 

Slumber 
of Years 

A. 

B. 

years ). 

N. 



8-000 

312 

~ 3-96 

4 1-34 

8-091 

356 

+ 4-32 

~ 0-98 

8-200 

287 

4 - 1-62 

- 0-64 

8-222 

296 

+ 0-19 

- 0-56 

8-333 

325 

-f 0-21 

4 0-91 

8-500 

323 

-4 0-17 

4 3-19 

8-667 

312 

4 2-51 

- 1-01 

8-800 

308 

4 2-97 

-" I " 0’83 

9-000 

306 

- 1-51 

- 0-57 

9-200 

322 

- 0-16 

- 1-56 

9-333 

336 

- 0-74 

4 0-64 

9-500 

304 

4 1-08 

4 1-07 

9-667 

290 

4 6-03 

4 0-37 

9-760 

312 

4 4-46 

3-56 

9-818 

324 

4 1-21 

- 4-94 

10-000 

320 

- 1-19 

- 0-83 

10-200 

306 

4 0-86 

- 0-22 

10-250 

328 

- 0 69 

4 1 10 

10-400 

312 

4 1-88 

- 1-66 

10-500 

294 

4 2-46 

- 1-82 

10-750 

301 

4 1-47 

- 3-13 

10-800 

324 

4 1-00 

- 4-75 

11-000 

308 

- 3-85 

- 4-26 

11-200 

336 

- 2-48 

4 0-55 

11-500 

322 

- 1-32 

- 0-66 

11-667 

280 

4 0-46 

4 1-42 

12-000 

312 

- 2-47 

- 4-04 

12-143 

340 

~ 0 22 

- 4-37 

12-333 

296 

_ 2-44 

4 2-74 

12-600 

325 

' - 1-22 

4 2-63 

12-667 

304 

-f 2-28 

1 5-19 

12-800 

320 

1- 5-70 

4 3-26 

12-875 

309 

4 6-46 

1- 0-77 

13-000 

312 

4 4-26 

- 4-32 

13-333 

320 

4 0-40 

1 0-37 

13-500 

324 

4 2-56 

- 2-09 

13-667 

328 

4 3-4S 

) ~ 1 34 

14-000 

308 

4 1-15 

- 1-00 

14-600 

290 

- 3-78 

- 0-18 

14-667 

308 

-- 1-50 

■ 4 4-23 

15-000 

300 

4 6-32 

: - 2-66 

1 5-200 

304 

f M 9 

1 - 8-52 

15-250 

305 

- 0-28 

; - 8-65 

15-286 

321 

- 2 - 3 .^ 

; ^ 7-15 

15-333 

322 

■ - 3-811 

1 - 6-55 

15-500 

310 

6-951 

! - 2-02 

16-000 

320 

l - 4 ( 

) 4 4-52 

16-667 

300 

1 - 5-21 

L - 0-39 

17 - 00(1 

1 306 

4 2 - 5 ( 

) - 6-35 

17-333 

1 312 

- 3 - 0 ‘ 

1 - 6-65 


Intensity . p . , 


300 


18-67 

23 - 23 
2-90 

0-34 

0 - 95 
10-41 

7-59 

9-77 

2-65 

2-66 

1 - 08 

2-26 

24 - 65 
33-89 
27-90 

2-25 

0-80 

1 - 84 
6-62 
9-19 

11-98 

25 - 48 
33-84 

7-24 

2 - 34 
2-()7 

23-30 
21-66 
1 1 -43 
9-13 
32-58 

46-01 

43-58 
38-23 
0-32 
1 1-79 
15-28 
2-38 
13-82 
20-69 

46 - 83 

75 - 04 

76 - 17 
60-62 
62-29 
59-1 1 
24-02 
27-33 

47 - 84 
54-55 


Number 
of Years 
N . 


17 - 500 

18 - 000 

18 - 500 

19 - 000 

19 - 750 

20 - 000 

21-000 

22-000 

23-000 

24-000 
24-667 

26-000 
26-000 

27 - 000 

28 - 000 

29-000 

30 - 000 

31 - 000 

32 - 000 

33 - 000 

34 - 000 

35 - 000 

36 - 000 

37 - 000 

38 - 000 

40-000 

41 - 000 

42 - 000 

44 - 000 

45 - 000 
4()-000 

48-000 
50-000 

52 - 000 

53 - 000 

54 - 000 

55 - 000 
5<}-000 
58-000 
60-000 
62-000 
64-000 
66-000 
68-000 
70-000 
74-000 
76-000 
78-000 
80-000 
84-000 


280 

306 

296 

304 

316 

320 

294 

308 

322 

288 

296 

326 

312 

324 

308 

290 

300 

310 

320 

330 

306 

280 

288 

296 

304 

320 

328 

294 

308 

315 

322 

288 

300 

312 

318 

324 

330 

336 

290 

300 

310 

320 

330 

340 

280 

296 

304 

312 

320 

336 




Intensity 

A . 

B . 

N { A ^ 4 52 ). 



300 

- 6-18 

— 4-45 

54-12 

- 4-40 

+ 1*25 

21-29 

- 1-46 

4 2-25 

7-10 

4 1-00 

~ 0-23 

1-07 

- 4-73 

- 1-59 

26-25 

- 5-71 

4 1-69 

37-88 

4 0-78 

4 2-61 

7-28 

4 1-87 

4 1-58 

6-18 

- 2-45 

- 1-43 

8-61 

4 0-45 

4 6-19 

26-10 

4 4-31 

4 1-99 

22-21 

4 3-86 

- 0-19 

14-94 

4 1-23 

- 1-34 

3-43 

4 0-60 

- 0-33 

0-38 

- 0-49 

4 0-68 

0-72 

4 1*08 

- 2-12 

5-46 

- 1-63 

- 2-34 

7-81 

- 1-98 

4 0-13 

4-06 

- 0-37 

4 0-51 

0-42 

4 0-96 

- 0-78 

1-68 

- 3-00 

- 2-16 

13-90 

- 4-64 

4 1-79 

23-11 

- 1-65 

4 4-86 

23-29 

4 2-08 

4 3-92 

19-47 

4 2-99 

4 0-66 

9-37 

— 1-44 

- 0-63 

2-63 

- 1 93 

4 0-93 

5-01 

4 0-93 

4 3-02 

9-75 

4 3-00 

- 0-14 

9-27 

4 1-69 

- 1-99 

7- 14 

-1- 0-16 

- 2-27 

5-58 

- 0-76 

- 0-09 

0-56 

1 1-83 

H 2-19 

8-14 

1- 4-77 

- 0-57 

24-03 

4.22 

- 2-60 

26-08 

4 2-84 

- 4-01 

26-09 

1 - 3-54 

- 3-30 

25-82 

4 3-31 

1 ~ 2 Zl 

) 18-47 

1 - 3-89 

1 4- 1-49 

1 16-82 

- 3-08 

; - 0-93 

: 10-32 

- 1-62 

1 4 0-39 

1 2-88 

- 0-78 

; 4 0-13 

1 0-66 

- 0 - 5 (! 

i - 0-66 

1 0-69 

1 2-9( 

) - 1-85 

3 13-58 

- 0 - 6 <l 

1 ~()- l ( 

; 0-47 

- l - 2 ( 

) 4 0-82 

! 2-07 

- 0-01 

i 4 M 7 

' 1-83 

- 1 - 0-51 

1 4 l - 2 t 

i 2-00 

4 0 - 7 '; 

r 4 0-82 

! 1-34 

4 0-2i 

6 4 0-6* 

9 0-62 
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time-series 

An examination of the periodogram suggests the possibility of 20 periods, as follows 


Period 

(Years). 

Corrected Intensity 

N (A* -1- B^). 

300 

Period 

(Years). 

Corrected Intensity 

N (A^ -+ B^). 

300 

2-735 

7-82 

11-000 

33-84 

3*417 

15-84 

12-000 

23-30 

4-417 

16-48 

12-800 

46-01 

5*100 

42-34 1 

15-260 

76-17 

5-415 

23-66 

17-333 

54-55 

5*667 

32-72 

20-000 

37-88 

5-933 

23-63 

24-000 

26* 10 

7*417 

21-72 

35-000 

23-29 

8-091 

23-23 

64-000 

26-09 

9-750 

33-89 

68-000 

13-58 


This is evidently rather an embarrassing profusion of possibilities, and we cannot 
immediately accept all these periods as significant. Sir William discussed them m detail 
in the original paper and was inclined to attribute reality to 18 or 19 of themi, partly on 
grounds which do not concern us here, such as the existence of weather oscillations with 
these ‘‘periods”. In particular, where a period had a high intensity he analysed the 
two halves of the series separately to see whether the periods persisted, finding that most 

of them did. 


30.45. An inspection of the correlogram of the series in Fig. 30..') reveals a striking 
difference between the two methods of analysis. From the correlogram we should be 
inclined to suspect a mean period of about 15 years, corresponding to the peak of gieatost 
intensity in the periodogram, with a subsidiary ripple of about 5 to fi years’ period, corre- 
sponding to one or more of the peaks in the periodogram ; but of the other 18 periods there 
is no sign. The conclusion is inevitable that either the correlogram is insensitive or the 
periodogram is misleading. Having raised this highly important (piestion we sliall, unfor- 
tunately, have to leave it unsettled in part ; but we shall show that at least thrc'e-cpuirters 
of the periods thrown up for consideration by the periodogram are not significant. 


30.46. The calculation of the intensity depends on that of the (pianthic's .1 and B 
of equations (30.67) and (30.68). Suppose in the first place that our trial period //. is an 
integer. We then write down the series in rows of fi, thus : 


tti 

Ui 


• • »/. 'j 




. , 'U 2 

n-\-\ 


/t+3 

. 1 

Totals mi 

m2 




We continue writing down the rows until there are fewer than /i. terms remaining, the 
extra terms being left out of account. The number pjn is th^i as near in multiples of fi 
as we can get to the number in the series n, and may be denoted by N. This array is some- 
times known as the Buys -Ballot table. 
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We then form the sum — 


2 r 27 c 

— { mi cos — 

ppc t p, 


. 47t , 

+ ma cos — + . 

fj. 


+ m,, cos 



. (30.74) 


and this is clearly the quantity A of (30.67) for the series of N terms. Similarly we have 



. (30.75) 


V , 

If the trial period ^ is a rational fraction - we write the series down in rows of v and 

proceed in the same way ; and if it is irrational or is a number which gives a large value 
of V when expressed as a fraction, we take two convenient neighbouring values of [x and 
interpolate in the periodogram. 


30 . 47 . In actual practice we do not write down the array (30.73). The sums m 
may be formed on an adding machine by starting with and then adding every fxth, mem- 
ber to give mi ; then starting with Wg and adding every ^th member to give m^, and so on. 
Or alternatively, the values may be written on cards, one for each member of the series, 
and the pack dealt into (x heaps. The total of the m’s, together with any members left 
over, equals the sum of the series and provides a check on the work. 


Example 30.8 

Consider the Beveridge series of Table 30.1. For the trial period 2 we may take 300 
terms of the series, and m\ (about ze,ro mean) will be the sum of the values u^, Us • . . 
and m .2 will be the sura of tlic values witli even subscripts. Tliese sums are for the years 
1545 to 1844 inclusive, 

mj — 14,909 
m.2 — 14,893. 

The mean is 14,901, so that about the mean of the series 


nil =- - 1 - 8 
ma -- — 8. 


Now, for a trial period 2, sin --- vanishes atid hence B -- 0 


For A we have (in our nota- 


which gives dilferent signs from Beveridge’s to A and B)- 

. 2 f 271 47t 

A = mi cos + ma cos 


271 

Y 


9 , 


i?- = - 0.11. 

300 


Thus (corrected) ^ = 0-01, 

^ ' 300 

as shown in Table 30.9. 

13 

For a trial period 2-600, we could take //. — and arrange the series in rows of 13, 

requiring 23 rows accounting for 299 values of the series. We may, however, save our- 
selves some arithmetic by taking 24 rows, a multiple of 4, occupying 312 observations. 
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Or rather, we take 6 rows of 52, giving us the values for a trial period 52 ; then add 
to W 27 , m 2 to mas and so on, giving the result we would have got by taking 12 rows of 26 
and hence providing the values for a trial period of 26 ; then we add again in the same way, 
and so on, obtaining successively the values of m required for trial periods of 13, 6*5, and 
3-25. Similarly, by multiplying the original 52 values of m by the respective values of 


cos and sin we get the values of A and B required for a trial period of 


It is 


thus evident that we can use the single set of 52 values of m to provide the required constants 
52 52 52 

for trial periods — , and so forth. This is the main reason why, in Table 30.9, 312 

X M O 

observations are shown as N for the trial periods 2-080, 2-261, 2-364, 2-476, 2-600, 2-737, 
2-888, 3-250, 3-714, 4-333, 5-200, 6-500, 7-429, 8-667, 10-400, 13-000, 17-333, 26-000 and 
52-000. The arithmetic, though difficult enough, is not as laborious as appears at first sight. 

30.48. There is an interesting relation between the periodogram and the correlogram 
by which the latter, in theory, determines the former. We consider, as in 30.38, a function 
u (t) defined at every point of time in some range — h to h. Then 


^ (P) + (P) (^) 


= [ 00 s ptu (t) dt sin pi w (i) dt . . (30.76) 

^ J -h ^ J -h 

•corresponds to the sums of (30.67) and (30.68) and may be written A -(- iB, where 

p = — (30.77) 

fX 

It follows that the intensity is related to the Fourier transform of r {k) by the relation, 
■derived from (30.63), 

8 ^ = 2<^; (p) 

2 

= — r (k) dk, . , , (30.78) 

J -2h 

which is true also in the limit, subject to conditions of existence. Thus the intensity is, 
if r (k) exists over an infinite range, the quantity — 


2 

lim - r {k) cos kp dk, 
k J -2A 


and if R (k) exists the parallel quantity — 


pOO 

I R (k) cos kp 

J —00 


The periodogram is thus derivable from the autocorrelation function. Since the latter 
does not uniquely determine the series the periodogram will not do so either. 

Example SO. 9 

Consider the autocorrelation function, which in present notation may be written 

R {k) = + y') , 

sin 'ip 
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This, as we have seen, represents the correlogram of an autoregressive series of the simple 
linear kind involving and We may write this as 


R{lc) 


Q-ak sin 

sin f 


q> 0 


since p is less than unity. It is to be remembered that since R { — k) ~ R {k), the modulus 
of k is to be used when k is negative. 

We have 

^2 = r cos kp dk 

J sin y) 

f oo 

e~ ' ' cos kO cos kp dk 


q 


+ 
k *> ' 


g-a + (0 + p)2 + (0 — p)2 


27t 


This is the intensity in the periodogram of the series, p being the quantity — and not to 

p 

be confused with our original damping factor p. 

2q 

It is remarkable that, as //, becomes large, >3- tends to the constant value -r ■ - - y -, 

“T” C/ 

that is to say, the periodogram tends to a fixed level, without peaks. From the analogy 
with the analysis of light-rays into colours (each colour corresponding to a particular har- 
monic), we may say that the ])criodogram develops a continuous spectrum ”. In a 
very intei’esting chajiter on [)erio(logram analysis Davis (1941) has given a number of 
e.xamples exhibiting this kind of effect. 


Significance of a. Periodogram, 

30.49. Supiiose that the values v/., . . . are random elements from a normal 
poj)ulation with vai'iance u-. Then the function 


A 

is normally distributed with variance 

var A 


u v-n 


J 1 


4u“ „ 'Zn 

> COS“ 

n- n. 


2nj 

P 




j ... 1 


p 


and similarly 





n 


var B 


2(P 

n 


. (30.79) 
. (30.80) 


We also see that cov {A, B) ~ 0 so that A and B are independent. Hence the joint 
distribution of A and B is 

(A" -1- B^) XdAdB. . . . (30.81) 

J 

A.S.— VOL. n. F S’ 


dF 


n 

, exp 

4710"“ 
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Thus the distribution of 8^ = is 



A 2 

The probability that 8^ exceeds ^ in value is immediately obtainable as e 


(30.82) 


30.50. This result is due to Schuster (1898), but it gives only the probability that 
a value of 8^ chosen at random will exceed a given value ; whereas in the periodogram 
we deliberately pick out the biggest values for inspection. Walker (1914) pointed out that 
if e”* is small the probability that all of m independent values of should not exceed 

^ ^ is (1 — so the probability that at least one should exceed that amount is 


n 

1 -(1 

Davis (1941) gives tables of this function. 


e-") 




(30.83) 


30.51. Both the Schuster and the Walker tests depend on a knowledge of Since 

4(j.2 

the mean value of in (30.82) is ^li® usual procedure is to consider the test as a com- 
parison of 8'^ with E {8 ^) ; but itself has to be estimated from the original data. 


30.52. Fisher (1929a) has given a test which avoids the inexactitude due to the 
estimation of or^. If v is the estimate and 8^ is the largest intensity, then the probability that 

g (30.84) 

will exceed a given value is 

<• (1 - gy-' - ( 2 ) +■■■+(- 1 )“-' (1 - «^gY-\ (:!"■«!■>) 

where v — ^ {n — 1), n being the (odd) number of observations, and m is the greatest 
integer less than l/g. The result was extended by Stevens (1939a)— see also Fisher (1940a) 
and Finney (1941a). Davis (1941) also gives tables of this function. 


30.53. All the tests we have described are based on random normal variation in ther 
original series ; but in practice nobody would embark on the labour of a periodogram 
analysis unless he had satisfied himself that the data were not random. It seems to me, 
therefore, that these tests are really off the main point, being tests based on a hypothesis 
which we have already rejected. They are not without their usefulness, however. We 
may assume with some confidence that if a particular intensity in the series is Jiot shown 
as significant on the hypothesis of random variation, it is not significant when the seiies 
is systematic. What does not follow is that if one intensity is significant then others must 
be so, even if they exceed the significance values ; for they are not independent of the 
significant value, at least for short series. What we ought to do, perhaps, is to extract 
the component which is considered significant from the series and then analyse the 
remainder ; and so on as long as significant terms appear. But this is hardly a practical 
computational possibility. Tests of significance in the periodogram, as in the correlogram, 
remain undiscovered. 
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Example 30.10 

Lsti us ©xuiuiiiG tliG significuncG of the 20 psriods of tliG Bovoridgo pGriodoorum given 

in 30.44. 

Sir William gave the value of in his original paper as 5-898. Expressing the 
intensities as a multiple k of this amount, we find : — 


■' ■ 


- ■ ■ - — 


Period. 

K. 

Period. 

K. 

2-735 

1-33 

11-000 

5-74 

3-417 

2-69 

12-000 

3-95 

4-417 

2-79 

12-800 

7-80 

5-100 

7-18 

16-250 

12-91 

5-415 

4-01 

17-333 

9-25 

5-667 

5-55 

20-000 

6-42 

5-933 

4-01 

24-000 

4-43 

7-417 

3-68 

35-000 

3-95 

8-091 

3-94 

54-000 

4-42 

9-750 

5-75 

68-000 

2-30 


Tlierc are 305 trial periods in Table 30.9. Let us consider the probability that at least 
one of 305 imlependent values of k will e.xceed given values, that is to say, the probabilities 
given by (30.83). We find— 

K Probability. 


2 

4 

0 


8 

[0 


1 -000 
0-!)9() 
0-531 
0-097 
0-014 


On tliis ba.His we should l)e inelincHl to attribute signiticauic^e to th(‘ period 15-25, for which 
K 12-91. We have Jio right, to be sur|)ris(Hl that at least one value e.vceeds k (>. If 
we take this value as the ei-itica.l oiu-, only tlu^ p(u-io(ls 5-100, 12-800, 15-250, 17-333 and 
20-000 would be significant, that is to say, (iv(' out of 20. 

Again, since c ** -• 0-007, we should (expect to find in 305 independent members two 
in excess of 5. Ae.tuaJly there a.re eight. But tli(\y ai-(> not independent and we cannot 
rely on this com pari son to say that six are sigidtie,a.nt. On the wliole, however, it looks 
as if at loa,st three -(pi alters of the periods are not significant, and iiossibly more. The 
csxaniple will illustrate the difficulty of testing the signifi(;a,n<^e of tlie periodogram as a whole. 


( \)rr elation 

30.54. The idea of serial correlation (uin be (.^xtendcid to the joint variation of two 
seri('.s. II we have two series ic (t), r (t) in sta.ndard measuri^, we may define the lag corre- 
lation of order k as 


(0 «(<))■(( I k)dt, 


( 30 . 86 ) 


where the integral includes summation in the case when the series are sjiecified at equi- 
distant points of time. We note that in this ease r (k) is not equal to r { — k) and r (0) 
is not unity. 
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Table 30.10 shows the lag correlations between two series of English wheat prices and 
horse populations (for the original series see Kendall, 1944a). The data are shown as a lag 
correlogram in Fig. 30.10. 


TABLE 30.10 

Lag Correlations for Two Series of English Wheat Prices and Horse Populations {Deviations 

from a Simple Nine-Year Average). 

(The order of the correlation is the number of years by which horse population lags behind wheat price, 
e.g. rio is the correlation of wheat price with the horse population of ten years earlier.) 



Fis. 30.10. — ^Lag Correlation of Wheat Prices and Horse Populations (Table 30.10). 
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The systematic appearance is unmistakable and we notice in particular that the maximum 
correlation occurs between the wheat price and the horse population of two years later. 
This bears the obvious explanation that when a farmer earns more he buys or breeds more 
horses ; but it does not follow logically that this must be so or that there need be any 
causal nexus between the two series. If two autoregressive series are oscillating with 
mean periods which are close together and only a short span of experience is available for 
scrutiny, then lag correlations of the damped sinusoidal type may appear, as it were, by 
accident. 

30.55. We have now reached the end of our account of the statistical analysis of 
time-series and the end of this book ; and the final words we have to say of the one will 
apply generally to the other. Much has been left unsaid, partly from lack of space, partly 
from deficiencies in the present state of knowledge, and partly from a desire not to over- 
burden the reader. We have not avoided mathematical analysis where it was necessary 
to advance the argument ; but we have insisted on the expression of results in numerical 
form and the necessity of experimental confirmation whenever it could be obtained. That 
there are gaps in the treatment we have given and unexplored branches of the subject 
to which we have barely referred are not entirely matters of regret ; for the over-early 
and peremptory reduction of knowledge into arts and methods is one of the errors which 
Bacon cautioned us against more than 300 years ago. Much remains to be done ; and this 
book will have served its purpose if the reader is left with the desire to do some of it himself. 


NOTES AND REFERENCES 


The theoretical aspects of the autoregrc’issive series and of moving averages are dis- 
cussed ill Wold’s book on 77 ac d aa./;//.s‘es‘ of Sf,<ihon(try Tvnu‘-ti(‘.T%€s (1938a). The basic 
uKunoir is that by Yid(‘ (1927a) on snnsiiots. Eor applications to meteorology see Walker 
(1931) and to economics Kendall (1944a). Davis’s book on Tlw Anah/fits of JSconornic Tims 
(1941) contains a, great (k'ai of intin-esting ina.t('ria.l but should not be read uncritically, 
d’wo ea.rlier papers liy Yulc! (1921 a.nd 1929) a.re also ot interest. See Uilso my paper on 
‘‘ The Analysis of Oscillatory ddrne-Heries ” in tlu^ Jouroul <>J flic Royal AUihsUccd Society 
for 1945, a paper by Yule in tlu^ same jonrna.l, my lirochnre (in press) on “ Researches in 
Oscillatory Time-Series ”, and a symposium introduced by Bartlett in the Supplement to 
the Journal for 1940. 

The classical work on [leriodogram analysis is that of Schuster (1898). The books 
by Brunt (1931) on The Comhinaliori of Oh.wrvaf lotos a.nd by Whittaker and Robinson 
(1940) on The Ualeukis of ()b,sermtions contain useful introductory accounts ; and Davis’s 
book referred to above has an excellent cha|)ter illustrated with an unusual number of 
examples. Papers by Crum (1923) and Crcmistein (1935) are of interest. The papers by 
Sir William Beveridge (1921, 1922) on wheat pihies and rahrfall have been justly described 
by Davis a.s a heroic piece of periodogram anaJysis. I’ablcs facilitating the calculation 
of intensities were published by d’lirner (191.3), and more complete tables will be given in 
my brochure referred to above. See also the book by Stiim})lf (1937). 

Various short-cut methods of periodogram analysis have been proposed by several 
authors, e.g. Ojipenheim (1909), Bruns (1921) and Alter (1933, 1937); but their value is 
problematical. There is a useful memoir by Bartels (1935) which is worth studying. 
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EXERCISES 


30.1. Eor the autoregressive series 

^i+2 + — ®i+2 

show that if £ is a random variable and the series is long, 


var u _ 1 + b 

vaFi “ (1 - 6) { (1 4- 6)2 - a^y 

and hence that the variance of the generated series may be much greater than that of 
£ itself. 


30.2. For the autoregressive series of the previous exercise use the relation 


to derive the relation 




rk 


sin {kd -|- -tp) 
sin y) 


30.3. If the estimated coelSicients a' and b' in the autoregressive scheme are reduced 
in the manner of 30.32 by a superposed error, show that 



6 

a 


(Yule, 1927</.) 


30.4. Show that if, in the autoregressive scheme of Exercise 30.1, 'b = I, the series 
becomes undamped and the correlogram reduces to a simple harmonic. Examine the 
effect on the solution (30.23). 


30.5. If any series has fitted to it a series generated by the scheme of Exercise 30.1, 
a and b being any constants, show that for the serial correlations of the residxials, say Of., 
we have 


(1 + + b^) pfc -f a (1 H- 6) {pjc+i + Pfc-i) + ^ (p/c+2 “H P&-2). 

1 ~1~ ci^ b^ 2a (1 -|— b^ Pi -|- 2bp2 


30.6. 


Show that the series with an autocorrelation function 


r (k) = 


sin Xk 

~xkr 


has a periodogram which is zero for periods less than ^ and has ordinate ^ for periods greater 

A A 

than -, i.e. has a continuous spectrum. 


30.7. In equation (30.71), noting that the dominant term vanishes for a 
where m is an integer, show that for such a “ vanishing ” trial period /< 

( TYh \ 

1 -{-—fiu approximately. 


P 


'IVITX 

n 
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Hence the width of a peak in the periodogram is approximately — , and the main peak 

will be flanked by smaller peaks of the same width. (This “ side-band ” effect is another 
complication in the interpretation of the periodogram, but not apparently a very serious 
one.) 


30.8. If a series of values ... is supplemented by a number of zeros as 
Uo, u_i, u _2 • • • '^n+is is necessary, and the resulting series differenced, 

show that 




-f 2(~ 1)^P,, 


where tj is the sum of squares of Jth differences and Pj 


'ri'-'j 

Z 

k=l 




Hence show that 


the arithmetic of serial correlation may be related to that of the variate-difference method, 
and vice-versa. 


30.9. Show that the serial correlations of a long series obtained by differencing a 
random series m times are given by 


r{k) = (-l)<‘ 


7n (m — 1) . . . (m — /j + 1) 
(m -f 1) . . . (w j- k) 


and hence that the correlogram of sueli a series oscillates. 


(Yule, 1921.) 


30.10. The Whittaker periodogram. Writing 

var VI 


(//,) - . 


var u 


where var u is the variarice of the sillies and vuv tn is the variance of the sums in of (30.73), 
show that if 


a sin |- 

A 


where hi is uncorrelated with periodic terms, tluMi 


Nn 

ar fi- sm“ -- 


Tj- {/!.) 


2Y2 sin2 ^ 
la'‘^ 4- var h 


4- var h 


Hence show that, in the neighbourhood of A, the graph of ij as ordinate with /t as abscissa 
(Whittaker’s periodogram) has a peak of breadth flanked by smaller peaks. 

(Whittaker, Month. Notes R. Astr. Soc., 1911, 71 ; cf. Whittaker and Robinson, Calculus of 
Observations.) 



APPENDIX A 


ADDENDA TO VOLUME I 

(1) Frequency and Distribution Functions 

An interesting paper by Burr (1942) considers the possibility of fitting elementary 
mathematical functions, not to the frequency function as has been the almost universal 
practice hitherto, but direct to the distribution function. This approach seems to merit 
further attention. In general, the distribution function has fewer analytical peculiarities 
than the frequency function — for instance, it cannot be infinite — and in applications to 
sampling it is the former which is nearly always required. The frequency function can, 
of course, be derived from the distribution function to a close approximation by differ- 
encing, or differentiation, processes which are usually easier to carry out than the inverse 
processes of integration. 

(2) Extension of the Carleman Criterion (4.22) 

Cramer and Wold (1936) have extended Carleman’s criterion for uniqueness in the 
problem of moments in the following form : — 

If 

~ d" A*oo?... + • ■ • • 

the distribution is completely determined by its moments if 



diverges. It is rather interesting that the criterion is independent of the product-moments. 

(3) Convergence of Series Leading to Standard Errors 

The usual type of expansion in differentials, exemplified in 9.6, raises a point of mathe- 
matical difficulty in that the differentials themselves and the remainder terras, though 
usually small, may sometimes be large for sampling reasons, however large tlie sam|)le. 
The necessary rigorisation of the process has been given by Derkson (1939) in terras of the 
notion of stochastic convergence, that is to say, a sort of statistical convergence in which 
the series converges nearly always in a precisely defined sense. 

(4) Moments of Moments for Finite Populations 

The formulae for moments of the mean and variance in samples from a finite ])opulation 
were stated without proof in 11.26. It is obvious that if in these results we let N, the 
population number, tend to infinity, we obtain the formulae for sampling from an infinite 
population. Irwin and I (1944) have recently shown that the process may be reversed 
and the formulae for the finite case derived from those for the infinite case. This offers 
the simplest and most direct method of deriving the formulae known to me. Reference 
may also be made to Sukhatme, “ On Bipartitional Functions ” {Phil. Trans., 1938, A, 
237, 375) and “ Moments and Product-Moments of Moment-statistics for Samples of the 
Finite and Infinite Populations ” {Sanhhyd, 1944, 6, 363). 
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(5) Tied MI’S 

In the treatment of ranlr correlation in Chapter 16 it was assumed that ranking was 
always possible ; but in practice cases oeciu’ when two or more individuals “ tie ” and the 
ranks have to be equahsed in some way. This possibility introduces the most intractable 
complications into theoretical work, but sometimes ties occur so frequently that a systema- 
tic method of dealing with them is necessary. The subject has been reviewed and recon- 
sidered by Woodbury (1940) and more recently by myself {Bim., 1945, 33, part 3). 






Daniels (1944) has recently unified the theory of rank correlation by showing that 
bpearman’s p, my r and the product-moment coeicient are particular cases of a general 
coefficient. In particular he has demonstrated the formula for the covariance of p and t 
given in 16.24 as very probably true. 



APPENDIX B 
BIBLIOGRAPHY 

The following Bibliography has no pretensions to completeness in spite of its length. 
It contains about half the titles recorded in my own notes, which themselves are doubtless 
far from comprehensive. Nevertheless, I hope it will be useful to those readers who want 
to take their studies of particular subjects somewhat further. By consulting the references 
given here and following up the references which they themselves provide, it should be 
possible for the reader to acquaint himself with most of what is known, or at least with 

what is worth knowing, about a particular topic. 

The names of authors are not included in the Index (pages 504 ff.) unless they occur 
in the text, since the Bibliography itself is arranged alphabetically under authors names. 
The subjects, however, are indexed, and anyone wishing to consult references on a par- 
ticular topic should refer in the first place to the Index, which in turn will refer to the 

authors who have dealt with the matter in question. 

In general the Bibliography contains only references to theoretical papers ; applica- 
tions and illustrative material are included only when some theoretical point is involved. 
Papers which have been superseded by later work are omitted, except where they have 

a historical interest. • i- i 

In compiling this material I have been particularly indebted to the valuable yicriodical 

reviews of Recent Advances in Mathematical Statistics by Irwin, Hartley and others in 
the JouytiOjI of ih& Royal Statistical Society : 1932, 95 , 498 , 1934, 97 , 114 , 98 , 

88; 1936, 99 , 714; 1938, 101 , 394; 1939, 102 , 406; and 1940, 103 , 534. 

Many papers written since 1939 are included, but some journals are not available in 
war-time so that foreign work published after the entry of various countries into the war 
may be incompletely represented. Where possible, the references have b(‘en <‘.hocked 
against the original publications, but here also I have had to rely on second-hand i efcTeiKies 
in cases where the original papers were inaccessible. 

jVoie.— Names beginning with de, del, le, St., van, von, etc., are entered under those 

titles, i.e. the order is strictly alphabetical. 
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Ackeemaxx, W. G. (1939). Eine Erweiterung des Poissonschen Grenzwcrtsat.z('iH und ihro 
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errors. Ibid., 54 , 12. 
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tions ; to correlations, Bibl., Roff (1937) 489. 

Correlated observations, sampling from, Bibl. : 
A. T. Craig (19336) 453, C. C. Craig (1931a) 
453, (1932) 454, Rhodes (1927) 488. See 
also Time-series. 
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Correlation, coiifidenec intervals for coefficient, 
81 ; Pitman’s test tor, llll— 2 ; significance 
of, 235. 

Bibl. : i^akesr (19306) 444, Bilham (1926) 
447, llis])liarn (1920, 1923) 447, Bonferroni 
(1939) 447, Brandor (1933) 449, W. Brown 
(1909) 449, Brownk'o (1910, 1926) 449, 
Cheshire.' jind others (1932) 451, Cochran 
(1937r/) 4i)2, Coloiniin (1932) 452, Cowles 
and Chapman (1935) 453, Day and Fisher 
(1937) 455, David (1937, 1938) 456, G. R. 
Davies (1930) 455, do Lury (1938) 466, 
Doming (1937) 456, Dkmlefait (1934a, 

1935«) 456, S. C. Dodd (1937) 457, Dunlap 
(1931) 458, Fells (1929) 459, Ezekiel (1930a) 
459, Fischer (1933a, 6) 460, Fisher (1915, 
1918, 1921c, 1924a) 461, Fr6chot (1933) 
463, Frisfdi (1929) 463, Frisch and Mudgett 
(19,11) 4(>,{, (larwood (1933) 464, Geary 
(1927) 464, Gohlke and Biohl (1934) 464, 
CfiringcM- (1933) 464, ,J(»ffroys (1939c) 471, 
Khint<‘luno (1928) 473, Kuzmin (1939) 474, 
Lindhljid (1937) 476, Morzrath (1933) 478, 
A. N. K. .Nair (1942) 479, Nnwbold (1925) 
479, E. 8. Pearsoti (1923, 1924, 1931a, 1932) 

482, K. Pearson (18976, 1900a, 6, 1902a) 

483, (1904, 190.5, 1907a, 1909, 1910, 1913a, 6, 

1914, 1921) 484, (19206, 192.56) 485, Pitman 
(1939c) 486, Prokopovic (1935) 487, Quensol 
(1938) 487, Ri(l('r (1932) 488, Romanovsky 
(I92,5a) 489, Soper (1913, 1914, 1917) 492, 
SteflV'nsen (1934) 492, Stouffor (1934, 

1936a, 6) 493. “Student” (19086) 493, 
Thorndike (1937) 494, Thouloss (1939) 495, 
Twdniprow (1925, 1928) 49.5, (1934) 496, 
Wicksell (1917a. 6, 1921, 1933) 499, Yasu- 
kawa (1925) .501, Yule (1897a, 6, 1906, 
1907. 1910) 502. 

See also Multiple Corr(^lation, Regression. 

- — - ratio, Hihl. : MoOOling (192.5) 469, Tsserlis 
( 1914, 1916) 470, Kolhy ( 1935) 472, Mussol- 
mmi (1926) 479, E. S. Pearson (1927) 482, 
K. Pearson (1905, 1910, 1911a, 6, 1915a) 

484, (1917, 19236) 485, “Student” (1913) 
493, Wallis (1939) 498, Wisiiart (1932a) 500. 

Coi‘relogra,m, 404 -12 ; significance of, 412-13 ; of 
general linear series, 420-1 ; relation with 
periodogram, 432 -3. 

Cost of liv’ing, Bibl. : B<mnott (1920) 446, Bowley 
(1919) 448, Konos (1939) 474. 

Cotton yar-n. BibL, Tippett (1935) 495. 

Counting experiments, Bibl., Poiorls (1935) 486, 
Tippett (1932) 495. 

Coutts, ,J. R. H., data from, (Table 22.1) 150. 

Covariance, analysis of, 237-45. Bibl. : Bailey 
(1931) 444, Bartlett (1936d, 1936c) 445, 
Brady (1935) 449, Cochran (1934) 452, 
Cornish (1940c) 453, Cox and Snedecor 


(1936) 453, Hirschfeld (1937) 468, K. R. 
Nair (1940a) 479, Snedecor (1935) 492, Wilks 
(1936) 499, (1938c) 500, Wishart (1936) 601. 

Covariance, distribution of, (Example 28.1) 334-5. 

Cramer, HC., co^-test, 108—9 ; Oarleman criterion, 
440. 

Critical region, 270, (Example 27.2) 312-13. 

Crop estimation, Bibl, Yates (1936c) 502. 

Crum, W. L., N.B., 437. 

Cumulants, Bibl : Ayyangar (1938) 444, Cornish 
and Fisher (1937) 453, C. C. Craig (1931c) 
464, Dressel (1940, 1941) 468, Frisch (1926) 
463, Gotaas (1936) 465, Thiele. (1931) 494. 
See also A-statistics, Moments. 

Curtiss, J. H., N.jR., 216. 

Curve fitting, Bibl : Elderton and Hansmamx 
(1934) 459, Fisher (1912) 461, Jones (1937a) 
472, Kerrich (1936) 473, Koshal (1933, 
1935, 1939) 474, Myers (1934) 479, Nair 
and Shrivastava (1942) 479, Nair and 
Banerjce (1943) 479, K. Pearson (1901c) 483, 
Rhodes (1930) 488, Roos (1937) 489, 

K. Smith (1916) 492, Snow (1911) 492, 

Wald (1940a) 497. Bee also Least Squares, 
Regression, Trend. 

Curvilinear regression, 145-74. Bibl, Menders- 
hausen (1937a) 477, T. V. Moore (1937) 
478 ; and see Regression. 

Cycle, 397-8. Bee Periodicity. 

Cyclical effects, tests for, 124-7, 370. Bee 

Periodicity. 

D^-statistic, N.R., 359. Bibl. : Bhattacharya and 
Narayan (1942) 446, R. C. Bose (1936a, 6) 

447, R. C. Bose and Roy (1938c, 1940) 

448, S. N. Bose (1935, 1937) 448, Roy 

(1939a) 489. See also Discriminatory 

Analysis, Multivariate Analysis. 

Daly, J. F., on shortest confidence intervals, 82 ; 
on bias in tests, 323 ; NJl., 304. 

Daniels, H. E., (Example 23.2) 183-5 ; rank 
correlations, 441. 

Dantzig, C. B., N.R., 304. 

David, F. N., confidence intervals for coi’relations, 
81 ; N.R., 304. 

Davis, H. T., time-series, 433, 434 ; N.R., 394, 
437. 

Day, E. E., N.R., 245. 

Death rates, Bibl, Farr (1919, 1920) 460, Pearson 
and Tocher (1916c) 485. 

Decomposition of series, Bibl, Anderson (1927) 
443, Smirnoff (1935) 491. See also Time- 
series. 

Decreasing functions, Bibl., C. D. Smith (1939) 
491. 

Degrees of freedom, of “ Student’s ” t, 102 ; of 
hypotheses, 270. 

De Lury, D., N.R., 137. 
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Denumerable probabilities, Bibl., Steinhaus (1923) 
492. 

Dependence, see Independence, Correlation. 

Derkson, J . B. D., on stochastic convergence, 440. 

Design, of sampling inquiries, 247-68 ; pre- 
liminary points, 248-9 ; stratified sampling, 
249-52 ; design of experiments, 252-4 ; 
orthogonality, 254 ; replication, 255 ; 
randomisation, 256—6 ; sensitivity of a 
test, 266—7 ; Latin squares, 267-62 ; con- 
founding, 262 ; design and randomisation, 
263-6. 

Bibl. : Bhattacharya (1943) 446, Chris- 
tidis (1931) 461, Fisher (1935c) 462, Jeffreys 
(19396) 471, “Student” (1938) 493, Wold 
(1943) 498, Yates (1939e) 502. See also 
Blocks, Factorial Experiments, Latin 
Squares, etc. 

Determinantal equations, Bibl., Glirshik (1939) 
466. See also Matrix. 

Deviance, footnote, 178. 

Difference, of two means, test of (equal variances) 
109-11 ; (unequal variances) 111-14. See 
also Behrens’ Test, Two Samples. 

, of two variances, 115-16. 

, equations, Bibl., Frisch (1932) 463, 

Marples (1932) 477. See also Auto- 

regression Equations. 

Differences of variates, Bibl., Irwin (1937a) 470. 

Dilution method, Bibl., R. D. Gordon (1939) 466, 
Matuzewski and others (1935) 477. 

Dirichlet integrals, 298. 

Discontinuous variates, Bibl. : dell’ Agnola (1937) 
456 ; Guldberg (1934) 466, Muench (1938) 
478, H. W. Norton (1937) 481, Ottestad 
(1937, 1938) 481. 

Discordant samples, 128. 

Discriminatory analysis, discriminant function, 
341-8. Bibl.: Barnard (1936) 444, Bartlett 
(1939c) 445, Dwyer (1942) 458, Fisher 
(1936a, 1938c, 19396, 1940d) 462, P. L. Hsu 
(19396, 1941«, 1941c) 469, H. F. Smith 
(1936) 492, Travers (1939) 496, Wallace 
and Travers (1938) 498, Welch (19396) 498, 
Wilks (1938c?) 500. See also Multivariate 
Analysis. 

Dispersion, Bibl., Norris (1938) 481 . See Variance, 
etc. 

matrix, 330, 341, N.R., 358. 

Dissection of frequency- distributions, Bibl., Burrau 
(1934) 450. 

Distributed lags, see Lags. 

Distributions, generally, Bibl. : Ambarzumian 
(1937) 443, Baten (1933a) 445, (1934) 446, 
Bispham (1922) 447, Bochner and lessen 
(1934) 447, Bochner (1937) 447, Bowley 
(1933) 448, Burr (1942) 450, Camp (1937) 
450, Cannon and Wintner (1936) 450, 


Chapelin (1932) 451, Cramer and Wold 
(1936) 454, Edgett (1931) 458, Eyraud 
(1938a) 459, Glivenko (1933) 465, Guldberg 
(1935) 466, Hansmann (1934) 467, Hartman 
and others (1937) 467, (1939) 468, Haviland 
(1934a, 6, 1935, 1939) 468, R. Henderson 
(1907) 468, Jessen and Wintner (1935) 
471, Khintchine (1937a) 473, Kullback 
(19366) 474, Mazzoni (1934) 477, K. Pear-son 
(1923c, 1924a) 485, R. Schmidt (1934) 490, 
von Mises (1939a) 497. 

Dodd, E. L., period generated by moving average, 
384, N.R., 394. 

Doob, J., N.R., 45. 

Dosage-mortality, Bibl., Garwood (1941) 464. 

response, Bibl., Irwin and Cheesoman (1939) 

470. 

Dugud, D., N.R., 45. 

Duration of play, Bibl., de Finetti (19396) 456, 
Fieller (1931a) 460. 

Eden, T., on Fisher’s distribution, 206, (Example 
23.8) 214, N.R., 216. 

Edgeworth, F. Y., N.R., 45. 

Edwards, J., Integral Calculus, footnotes, 44 arid 
50. 

Efficiency, of o.stimators, 5-7 ; of maximum 
likelihood estimators, 18-19 ; of moments 
in fitting Pearson curves, 4,3-4 ; of sampling, 
Bibl., Yates and Zacopanay (1935c) 502. 

Egg-production, in laying liens, (Tahiti 29.5, 
Figure 29.5) 368.^ 

Egyptian skulls, (E.xample 28.3) 345- -8. 

Elasticity of demand, Bibl., Mo.sak (1939) 478, 
Schultz (1933) 490. 

Elderton, E. M., (Example 21.14) 133, N.R., 266. 

Elderton, Sir William P., N.R., 45. 

Electric lamps, testing of, (Example 23.1) 179-80- 

Elimination of variates, in ri^gression analysis, 
167-70. 

Enumeration in sampling, Bibl., Cochran (19396) 
452. 

Equidetectability, curves of, 318. 

Equimodal distributions, Bibl., Mouzon (1930) 478. 

Error, in variance-analysis, 187. 

Errors, of first and second kind, 270, (Exercise 
26.5) 305. ■ 

, general theory of, Bibl. : Brelot (1936, 

1937) 449, Campbell (1935) 450, Cram^ir 
(1928) 454, Deming and Birgo (1934) 456, 
Edgeworth (1905, 1906) 458, Jeffreys (1933, 
1937c, 1938d, 1939d) 471, Mahalanobis 

(1922) 476, Wertheimer (1932) 499. See 
also Least Squares. 

Estimation, generally, 1-49, 50-62 ; in analysis 
of variance, 181, 218-19. 

Estimator, definition, 2 ; consistence of, 3 ; bias 
of, 3—4 ; efficiency of, 5-10 ; sufficiency of. 
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7-12 ; approximation to, 22--4 ; most 
general Hufficient form, 24-5 ; accuracy of, 
28-9 ; ancillary, 32-3 ; in multivariate 
case, 33-42 ; location and scale, 40—2 ; 
by minimum variance, ,50-5 ; by minimum 
X^, .55-8 ; by inverse probability, 58-9 ; 
by least squares, 59-60. *S'ee also Maxi- 
mum Likelihood, Minimum Variance. 

Bibl. : Aitkon and Silvorstone (1942) 
443, Beall (1939) 446, S. S. Bose and 
Mahalanobis (19385) 448, Darmois (1935, 
1936) 455, O. L. Davies and Pearson (1934) 
455, Doob (1936) 457, Dugu© (19.S6fi, 6, 
19376) 4.58, Fisher (192.56 ) 461, (1934d, 
19386, d) 462, Geary (1942, 1944) 464, 

Halphen (1939) 467, Noyman (19376) 480, 
E. S. Piiarson (1937«, i939) 483, Pitman 
(19376, 1939a) 486, Wald (1939ff) 497. 

Expectation of life, see Life. 

Expected x’ahu's, see Moan Values. 

ca.so, in sociological data, Bibl., Stouffer and 

Tibbits (1933) 493. 

Expenditure of families, (Exarnplo 23.9) 214-15. 

Exponential distribution, (Exercise 26.8) 305-6. 
Bibl., Paulson (1941) 482, Sukhatme (19366) 
493. 

Extra-sonsory perception, Bibl., Greenwood and 
Stuart (1937) 465, Stevens (19396) 493. 

Extremes, distribution of, Bibl. : Daniels (1941) 

455, do Finetti (1932) 455, Dodd (1923) 

456, Fisher and Tippett (1928a) 461, 

Gambol (1934, 193.5a) 466, McKay (1935) 
477, Olds (1935) 481, Tippett (1925) 495. 
jSee also /nth Values. 

F-di.strihiition (variance ratio), Bibl., Morrington 
and 'riiompson (1943) 478. See Fisher’s 
Distribution. 

Factor analysis (psychology), Bibl. : Bartlett 
(1937cj 445, W. Brown (1935) 449, Burt 
(19;i7a, 6, 19.38rt, 6) 4.50, Camp (1932, 1934) 
450, Darmois (1934) 455, Emmett (1936) 
4.59, Hool (1937, 1939) 468, Irwin (1933) 
470, L(Hl(a-mann (1938) 475, Roff (1937) 
489, I’homson (1916, 19196, 1939) 494, 
Thurstono (193,5, 1938) 495. 

Factorial oxp(»riments, 199-202. Bibl. : Barnard 
(1936) 444, R. C. Bose and Kishon (1941) 
448, Cornish (1936, 19406, c) 453, Gouldou 
(1937, 1938) 465, P. L. Hsu (1943) 470, 
Kishon (1940) 473, Wishart (1938) .501, 
Yates (19376) 502. 

moments, Bibl., Gonin (1936) 465, Ottostad 

(1939) 481. 

sums, in fitting regressions, (Example 22.8) 

164-5. 

Factorisation of variables, Bibl., S. 0. Dodd (1927) 

457, 


Families of alternatives, 27.5-6. 

Feller, W., N.R., 303. 

Fiducial inference, 85-95. Bibl.: Bartlett (1939a) 
445, Fisher (1933, 1935a, 19356, 1936c, 
19376, 1939a, 1940c, 1941a) 462 ; Garwood 
(1936) 464, Ricker (1937) 488, Segal (1938) 
491, Wilks (19386, c) 499, (1939a, 6) 600. 
See Confidt^nco intervals. 

Field experiments, Bibl., Wishart and Saunders 
(1935) 501. See Design. 

Fifteen-constant .surface, Bibl., K. Pearson (1925a) 
485. 

Filon, L. N. G., N.E., 45. 

Finite populations, .sampling from, Bibl. ;• Church 
(1926) 4,52, Hansen and Hurwitz (1940) 

467, Irwin and Kendall (1944) 470, Isserlis 
(1918c, 1931) 470, Neyman (1925) 480, 
O’Toole (1934) 481, Sukhatme (1944) 494, 
Tschuprow (19186, 1921, 1923) 495. 

Finney, D. J., z-test, 199 ; tost of significance in 
poriodogram analysis, 434 ; N.R., 137, 216. 

Fisher, R. A., fitting by moments, 43 ; fiducial 
probability, 90 ; tables for Behrens’ test, 
92, 93, 111; expansion of “Student’s” 
integral, 101 ; tables of t, 102; difference 
of two means, 110 ; z-distribution, 116, 
117 ; configuration of a sample, 127 ; 
fitting regressions, 165 ; theorem on sum 
of squares, 176-7 ; de.sign of experiments, 
263 ; discriminatory analysis (Example 
28.2) 342—4 ; distribution of canonical 
correlations, 357 ; Bignificanco of a poriodo- 
gram, 434 ; N.B., 45, 61, 83, 94, 136, 173, 
216, 246, 266, 369. 

Exerclsos from: (Exorcise 17.1) 45, 

(Exercises 17.4, 17.5, 17.6) 46, (Exercise 
17.12, 17.1,5, 17.16) 48, (Exercise 17.19) 49, 
(Excircise 18.3) 61, (Exorcises 20.1, 20.2) 
94 5. 

Fishor's distribution (z-distribution), properties of, 
116-18; in variance analysis, 179, 199; 
in non-irormal case, 205-6, 234-6, (Example 
26.8) 289-91 ; in linear hypothesis, 301 ; 
in discriminatory analysis, 346. 

Bibl. : Aroian (1941) 444, R. A. Chap- 
man (1938) 451, Cochran (1940a) 452, 
Daniels (1938a) 454, Eden and Yates (1933) 

468, Fisher (1924c) 461, P. L. Hsu (1941c) 

469, Lawloy (1938) 475, McCarthy (1939) 
477, Paulson (1942) 482, Welch (1937) 498. 

Fitting, see Curve Fitting, Least Squares. 

Flood flows, Bibl., Gumbol (1938a, 1941) 466. 

Fluctuations in timo-soi-ies, Bibl., R. A. Gordon 
(1937) 465. See Time-series. 

Forecasting, Bibl. : Cowles (1933) 453, Cowles and 
Jones (1937) 453, de Finetti (1937) 466, 
Schultz (19.30) 490, Yates (1936o) 502. 

Forsyth, A. R., Calculus of Variations, footnote, 60. 
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Fourier analysis, see Harmonic Analysis, Period- 
icity. 

Fragmentary samples, Bibl., Wilks (1932a) 499. 

Frankel, L. R., N.R., 136, 266. 

Freedom, degrees of, see Degrees of Freedom. 

Frequency-distributions, see Distributions. 

Frequency theory of probability, Bibl. : Campbell 
(1939) 450, Cantelli (1923, 1932, 19336) 450, 
(1936) 451, Dorge (1934, 1936) 458, von 
Mises (1931) 497. Bee Probability, Random 
Sequence. 

Friedman, M., (Example 23.9) 214-15. 

Frisch, R., N.R., 358. 

Galton’s problem, Bibl. : Galton (1902) 464, Irwin 
(1925a) 470, K. Pearson (1902c) 484. See 
Rank Correlation. 

Gamma distribution, Bibl., Kibble (1941) 473. 
See Type III. 

Garwood, F., confidence intervals for Poisson dis- 
tribution, 81. 

Gauss, K. F., variance of residuals,, 60-1 ; stan- 
dard errors, 153 ; N.R., 45. 

Gaussian distribution, see Normal Population. 

Geary, R. C., distribution of t, 102-4 ; test of 
normality, 106 ; theorem on independence, 
118 ; (Exercises 21.1, 21.2) 137-8 ; N.R., 
45, 136. 

Geary’s ratio, Bibl., Geary (1935a, 6, 1936a) 464, 
Tricomi (1937) 495. 

General factor (intelligence), see Factor Analysis. 

Generalised distance, of Mahalanobis, N.R., 359. 

Generating functions, Bibl., Aitken (1931) 442. 
See Characteristic Functions. 

Geometric Mean, Bibl., Camp (1938a) 450, Norris 
(1938, 1940) 481. 

Gernxination of wheat-seeds, (Example 23.7) 207-9. 

Gini’s mean difference, 108. 

Girshik, M. R., (Exercise 28.11) 362, N.R., 359. 

Glass, seed in, (Example 23.6) 202-4. 

Goodness of fit, tests of, 106-9. Bibl. : David 
(1939) 455, Neyman (1937a) 480, K. Pear- 
son (1934) 486, Thomson (1919a) 494. See 
Chi-squared. 

Gosset, W. S. (“ Student ”), 80, 266, N.R., 394. 

Gould, C. E., (Example 23.6) 202-4. 

Goulden, C. H., N.R., 216, 266. 

Grades, see Rank Correlation, Galton’s Problem. 

Graduation, Bibl., Aitken (1933a, 6, c) 442, Key- 
fitz (1938) 473. See Interpolation, Least 
Squares, Orthogonal Polynomials, Trend. 

Graeco-Latin square, 261—2. Bibl., R. C. Bose 
(19386) 448. 

Gram-Charlier series, estimation in (Exercise 18.1) 
61 ; for non-normal t, 103 ; goodness of 
fit in, 109 ; in z-distribution, 116. Bibl. : 
Aitken and Oppenheim (1931) 442, Aitken 
(1932) 442, Aroian (1937) 444, Baker 


(1930d, 1935) 444, Charlier (1906, 1912, 
1928, 1931) 451, Cornish and Fisher (1937) 
453, C. C. Craig (19316) 454, Cramer (1926, 
19356) 454, Doetsch (1934) 457, Edgeworth 
(1905) 458, Gram (1879) 465, Hildebrandt 
(1931) 468, Jacob (1933, 1935, 1937) 471, 
Meisener (1938) 477, Quensel (1938) 487, 
Samuelson (1943) 490, Schmidt (1934) 490, 
Steffensen (1930) 492, Wickscll (19176, 
1934a) 499. 

Greenstein, B., N.R., 437. 

Grouping corrections, Bibl. : Abernethy (1933) 

442, Alter (1939) 443, Baton (1931) 445, 
Bliimel (1939) 447, Burkhardt and Stackel- 
berg (1939) 449, Carver (1933, 1936) 451, 
C. C. Craig (1936c, 19416) 454, Elderton 
(1933, 19386) 459, Kendall (1938a) 472, 
Lewis (1935) 475, Sandon (1924) 490. 

, effect on correlations, Bibl., Gehlko and 

Biehl (1934) 464. 

, significance of, Bibl., Stevens (19376) 493. 

Groups of experiments, Bibl., Yates and Cochran 
(19386) 502. 

Hampton, W. M., (Example 23.6) 202-4. 

Hansmann, G. H., N.R., 45. 

Harmonic analysis, Bibl. : T. F. Anderson (1935) 

443, Brunt (1928) 449, Carslaw (1930) 451, 
Fisher (1929a) 461, (1940a) 462, Frisch 
(1928, 1931, 1933) 463, Poliak (1926) 487, 
Turner (1913) 496, Wiener (1930) 499. 
See Periodicity. 

mean, Bibl., Norris (1939) 481. See Mean 

Values. 

Hartley, H. O., on z-test, 199 ; k .sampic.s, 299 ; 
N.R., 137, 216, 304. 

Heads and tails, Bibl., Fieller (1931c) 460. See 
Duration of Play. 

Hendricks, W. A., (Exorcise 21.9) 139 ; N.R., 136. 

Hermite polynomials, see Tchebychoff-Horraite 
Polynomials. 

Heterogeneous populations, Bibl., Baker (1930c, 
1932) 444. See also Lexis Theoiy, Strati- 
fied Sampling. 

Hierarchies in correlation, Bibl., Thomson (1916, 
19196, 1935) 494, Wilson (1928) 500. See 
Factor Analysis. 

Higham, J. A., (Exercise 29.7) 395. 

Highest audible pitch, (Example 22.4) 152-3, 
(Example 22.5) 155-6. 

Hirschfeld, H. O., see Hartley, H. O. 

Homogeneity, Bibl. : Baker (1941) 444, Hartley 
(1940) 467, Welch (1938a) 498. See k 
samples. 

Horse population and wheat prices, 436. 

Hotelling, H., canonical correlations, 348-58 ; 
(Exercises 28.7-28.10) 360-2 ; N.R., 45, 
136, 359. 
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Hotelling’s T, 323, 335-8 ; N.R., 369. BibL, 

Hotelling (1931) 469, P. L. Hsu (1938c) 469. 

Hsu, P. L., linear liypothesis, 301; Wishart’s 
distribution, 333 ; canonical correlations, 
357 ; NJi., 304, 359. 

Hyporgoometric series, Bihl. : Ayyangar (1934) 
444, Camp (1925a) 450, O. L. I)avie.s (1933, 
1934) 4.55, Qonin (1936) 465, K. Pearson 
(18996) 483, (19246, c) 485, Romanovsky 
(192.56) 489. 

Hypothese.s, te.sting of, see Statistical Hypotheses. 

Imaginary random variable.s, BibL, Eyraud (19386) 
459. 

Immunity, BibL, Brownloe (1905) 449. 

Incomes, distribution of, Bihl., Cantelli (1929) 
450, Daimois (1933) 455. 

Incomplete blocks, see Blocks. 

Independence, of (quadratic forms, Bibl. : Cochran 
(1934) 452, A. T. Craig (1936a, 1943) 4.53, 
Madow (1940) 476. 

■ , statistical, Bibl. : del Vecchio (1933) 456, 

Kae ami van Kampen (1939) 472, Marcin- 
kiewicz and Zygimmd (1937) 477, T.schu- 
prow (1934) 496. See also Correlation, 
Contingency, etc. 

Index, distribution of, see Ratio. 

— numbers, Bibl. : Bowloy (1926) 448, Clare- 

mont (1916) 452, Crowtbor (1934) 454, 
Dodd (1937c) 457, Edgeworth (1925a, b, c) 
459, I. Fisher (1922) 460, Flux (1921, 1933) 
463, Frickoy (1937) 463, Fri.sch (1930) 463, 
Haborlor (1927) 467, Konos (1939) 474, 
Por.son.s (1928) 486, Rhodes (1936) 488, 
Schultz (1939) 490, Yatos (1939c) 502. 

Indices, oorrfOation of, Bibl. : Baker (1937) 444, 
.1. W. Brown and others (1914) 449, Clare- 
mont (1916) 4.52. 

Industrial accidents, B'i6Z., NewVjold (1927) 479. 

processes, see Quality Control. 

Inequalitiiis, Bibl. : Mortara (1934) 478, Narurai 
(19236) 479, Norris (193.5, 1937) 481, 

Romanov.sky (1938) 489, Shohat (1929) 
491, C. D. Smith (1930) 491, voir Mises 
(19396) 497, Wald (1938) 497. 

Infantile mortality, Bihl., Feld (1924) 460. 

Infection in potatoes, (Example 24.5) 230—2, 

(Exami)lo 24.6) 232—3. 

Inference, see Statistical Hypotheses. 

Information, amount of, 29—30 ; loss of, 30—2 ; 
in minimum 57—8. Bibl. : Bartlett 
(1936a, 6) 445, Fisher (19346, 1935a) 462. 

Inten.sity, of a periodogi-am, 425. 

.Interaction, in variance-analysis, 187, 188-9. 

Interference, analysis of, Bibl., Stevens (1936) 493. 

Interpolation, Bibl. : Comrie (1936) 462, Erdos 
and Turan (1937, 1938) 459, Feldheim 
(1936o) 460, Fisher and Wishart (1927) 


461, Gini (1921) 466, Lidstone (1937) 476, 
Pietra (19326) 486, Salvemini (1934) 490, 
Simaika (1942) 491, Tchebycheff (1907) 

494. See also Graduation, Least Squares, 
Orthogonal Polynomials. 

Intra-class correlation, 181, Bibl. Harris (1914) 
467, Harris and Gunstad (1931) 467. 
Intrinsic accuracy, in estimation, 28-9. 
Invariants of frequency curves, Bibl., Zoch 
(1934) ,503. 

Inverse probability, in estimation, 58-9 ; relation- 
ship with fiducial inference, 90-1, 93-4. 
Bibl. .- Bayes (1763) 446, Fisher (1926c, 
1930a) 461, (1932, 1936a) 462, Isserlis (1936) 
471, Jefireys (19376) 471, Tornier (1937) 

495, Wisniewski (19376) 501. 

Iris (flower), (Example 28.2) 342-4. 

Irregular Kollektiv, 123. See Random Sequence. 
Irwin, J. O., (Exercise 23.1) 216-17 ; sampling 

moments, 440 ; N.B:., 216. 

Item analysis, Bibl., Morril (1937) 478. 
Iterations, see Runs. 

J-shaped distributions, Bibl., Eldortpn (1933) 
459, Solomon (1939) 492. 

Jackson, W. R., N.R., 304. 

.Teffreys, H., (Example 18.5) 56-7 ; fiducial 

inference, 90-1, 93-4 ; N.R., 61, 94, 266. 
Jensen, A., N.R., 266. 

Joint sufficiency, 39. 

Judgments, validity of, Bibl., Eysenck (1939) 469. 

k samples, problem of, 119-22, 295-9 ; bias in, 
323, (Exercise 27.2) 326. Bibl. : Bartlett 
(1934a) 446, Bishop (1939) 447, Bishop and 
Nair (1939) 447, R. C. Bose and Roy (1940) 
448, G. W. Brown (1939) 449, jSTeyman 
aiul Pearson (19316) 480, Pearson and 
Wilks (19336) 482, Sulchatme (19366) 493, 
(19376) 494, Welch (1935) 498, Wilks 

(19356) 499. See B-tosts. 

^-statistics, Bihl. : Fisher (19296) 461, Fisher and 
Wishart (1931) 462, C. T. Hsu and Lawley 
(1939) 469, Kendall (1940) 472, (19426) 473, 
Wishart (1929a, 6, 1930, 19336) 500. See 
also Moments, sampling. 

Kelley, T. L., (Example 28.4) 361-2. 

Kermack, W. O., N.R., 136. 

Keynes, Lord, (Exercise 17.7) 47. 

Kolmogoroff, A., confidence intervals for ter- 
minals, 83. 

Kolodzieozyk, St., linear hypothesis, 293 ; N.R., 
304. 

Koopman, B. O., (Exercises 17.13, 17.14) 48, 
N.R., 45. 

Koshal, R., N.R., 46. 

Kxonecker delta, 329. 
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Kurtic curve, 142. 

Kurtosis, Bihl., Frisch (1934a) 464. 


L-tests, Bibl. : Mahalanobis (1933) 476, Mood 
(1939) 478, Nayer (1936) 479, Paulson 
(1941) 482, Welch (1936a) 498, Wilks and 
Thompson (1937a) 499. Bee k samples. 

Lag correlation, 435—6. 

Lags, distributed, Bihl. : Alt (1942) 443, Koop- 
mans (1941) 474, K. R. Nair (1936) 479, 
Zrzavy (1933) 503. 

Lanarkshire milk investigation, N.R., 266. 

Large numbers, law of, see Convergence in Proba- 
bility. 

Largest member of a sample, see Extremes. 

of a set of variances, see Variance ratio. 

Latent roots of a matrix, see Matrix. 

Latin squares, 257—62, 266. Bihl. : R. C. Bose 
(19386) 448, R. C. Bose and Nair (19426) 
448, Euler (1782) 459, Fisher and Yates 
(1934c) 462, Fisher (1942d, e) 462, Mann 
(1943) 477, H. Norton (1939) 481, Stevens 
(19386) 493, Welch (1937) 498, Yates (1933c) 
501, (1936a) 502. 

Lattices, distributions on, van Kampen and 
Wintner (19396) 496. 

Lawley, D. N., N.M., 359. 

Least squares, in estimation, 59 ; in regression 
analysis, 145 ; in time-series, 371. Bibl. : 
Adcock (1878) 442, Aitken (1933a, 6, c, 
1935a) 442-3, Davis (1933) 455, David and 
Neyman (1938c) 455, Doming (1931, 1934, 
1935, 1937) 456, Hendricks (1931, 1934) 
468, E. Johnson (1940) 471, Jones (1937a) 

472, Jordan (1932, 1934) 472, Kerrich (1937) 

473, Sheffer (1935) 491, Sheppard (1914, 
1929) 491, Sterne (1934) 493, Wisniewski 
(1937a) 501, Wong (1935) 501. 

Lexis, W., ratio, 119 ; N.R., 216. 

theory, Bibl. : Geiringer (1942) 465, Rider 

(1934) 488, Tschuprow (1918, 1919o) 495, 
von Bortldewicz (1931) 497. 

Life, expectation of, etc., Bibl. : Brownloo and 
Morison (1911) 449, Dublin and others 
(1935) 458, Greenwood (1922) 466, Gumbel 
(1924, 1925, 1932) 466, Seal (1940) 490, 
Wilson (1938) 500. 

Likelihood, in estimation, see Maximum Likeli- 
hood ; in testing hypotheses, 277-80, 295- 
302, 323-6. Bibl., Fisher (1932, 1934a, 6) 
462, Wilks (1935a) 499. 

Likelihood-ratio tests, Bibl. : Daly (1940) 454, 
Neyman and Pearson (1933c) 480, Wilks 
(1938a) 499, Wilks and Thompson (1937a) 
499, See i-tests. 

Limiting form of significance tests, 322. Bibl., 
Reiser (1943) 486. 


Linear equations subject to error, Bibl., Lonseth 
(1942) 476. 

hypotheses, 292-5, 300-2. Bihl., Jolmson 

and Neyman (1936) 472, Kolodzieczyk 
(1935) 474. 

Linearity of regression, see Regression. 

Linkage, Bihl., Finney (1940, 1941, 1942) 460, 
N. L. Johnson (19406) 472. 

Link-relatives, Bihl., Robb (1930) 489. See Index 
Numbers. 

Live births, proportion of males among, (Example 

21 . 8 ) 120 . 

Location, estimation of parameters of, 40-2 ; 
centre of, 41 ; Pitman’s tests of, 323-6. 
Bibl., Pitman (1939a, 6) 486. 

Logarithmic variate, Bihl. : Finney (19416) 460, 
Jenkins (1932) 471, Nydell (1919) 481, 
Pae-Tsi-Yuan (1933) 481, Quonsel (1936) 
487, Wicksell (1917a) 499, Williams (1937) 
500. 

Loss of information, in estimation, 30-2. 

weight in soil, (Example 22.3) 149-52, 

(Example 22.6) 158. 

m rankings, problem of, (Example 23.9) 214-15. 
Bihl., Friedman (1937, 1940) 463, Kendall 
and Babington Smith (19396) 472. 

Macaulay, F. R., (Exercise 29.4) 395 ; N.R., 394. 

MaoStewart, W., N.R., 304. 

Madow, W. G., N.R., 359. 

Magnetic declination, Bibl., Schu.ster (1899) 490. 

Magnitude, random division of, Bibl., Fislu'r 
(1940a) 462, Stevens (1939a) 493. 

Mahalanobis, P. C., N.R., 303, 304, 359. 

Males, proportion in births, (Example 21.8) 120 ; 
marriages of, (Example 21.9) 121- 2. 

Markoff, A. A., theorem on least squares, (ExtuxMse 
25.5) 267. 

process (Markoff' chains), Bibl. : Doeblin 

(1936, 1937) 457, Elfving (1937, 1938) 459, 
Feldheim (19366) 460, Fortot (1935 8) 4()3, 
Frecfiet (1935, 19366, 1937a) 463, (h^iringer 
(1938) 464, Hadaraard and Frecluit (1933) 
467, Hustinsky (1937) 469, Kolrnogorol'f 
(19376) 473, 'bevy (19356, 1936c) 475, 

Markoff (1912) 477, Mihoc (1934) 478, 
Onicescu and Mihoc (1935-9) 481, Roman- 
ovsky (1936a) 489, Scukarcw (1932) 490. 

Marriage, males according to age fit, (Examph* 
21.9) 121-2. 

rate in England and Wales, (Table 30.2) 397, 

(Example 30.3, Table 30.5, Figure 30.4) 
408-9. 

Martin, E. S., N.R., 359. 

Mass production, see Quality Control. 

Matching problems, Bibl. : Battin (1942) 446, 
D. W. Chapman (1935) 451, J. A. Green- 
wood (1938) 465, (1940) 466, Greville (1938, 
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1941) 466, Olds (1938a) 481, Vernon (1936) 
496, Wilks (1932c) 499. 

Mathematical Tripos, distribution of women 
obtaining firsts in, (Example 18.5) 56-7. 

Matrix, arithmetic of, Aitken (1937a, 6, 1938) 443, 
Bingham (1941) 447, Dwyer (1941a, b) 458, 
Hotelling (1943) 469. 

Maximum likelihood e.stimators, 12-49 ; con- 
sistence, 13-15 ; normality, 15-17 ; variance 
of, 17-18 ; efficiency of, 18—19 ; sufficiency, 
19-20 ; for several parameters, 34-49 ; 
variance and covariance of, 36-7 ; relation 
with minimum variance, 53, and with con- 
fidence intervals, 73-4. 

Bibl : Carlson (1932) 451, Fisher (1912, 
1921a, 19256, 1928c) 461, (1932, 1934a) 
462, Hotelling (1930) 469, Jeffreys (19386, 
1938c) 471, Koshal (1933, 1935, 1939) 474, 
Myers (1934) 479, E. S. Pearson (1937o) 
483, K. Pearson (1936) 486, Welch (1939c) 
499. 

McKendriek, A. G., N.R., 136. 

Mean, arithmetic, estimation of, 2 ; (Example 

17.6) sufficient estimator for, 11 ; (Example 

17.7) 19-20 ; most general distribution for 
which it is estimator (ExamiDle 17.10) 22 ; 
significance of, 98-100, (Examples 27.1, 
27.2) 311-12. 

— • — dev'iation, in testing normality (Gear\'’s 
ratio), 106 ; distribution of m.d., Bibl. : 
Fisher (1920) 461, Frechet (1936a) 463, 
Tricorni (19366, 1937) 495. 

difference, 108. Bibl. ; Cantelli (1913) 450, 

do Finetti and Paciello (19306) 455, de 
Finetti (1931) 455, V. S. Nair (1936) 479, 
Wold (19,35) 501. 

values, Bibl. : Aumann (1934-5) 444, Bunak 

(1936) 449, A. T. Craig (19366) 453, Dodd 
(1934, 1937a, 6, c, 1938) 4,57, Doodson (1917) 
458, Dres.sel (1941) 458, Norris (1935, 1937) 
481, Wertheimer (1937) 499, Yasukawa 
(1925) 501, Zoch (1935, 1937) 503. 

Means, distribution of, Bibl. : Baker (1930d, 1931, 
1932, 1936, 1940) 444, Behrens (1929) 446, 
R. Cl. Bose (1938a) 448, Carlson (1932) 451, 
Cochran (1937a) 452, A. T. Craig (1932) 
453, Dodd (1926-7) 456, Dunlap (1931) 458, 
Hall (19276) 467, Holzinger and Church 
(1929) 469, Irwin (1927, 1929, a, 1930) 470, 
Immer (1937) 470, Isserlis (1918a) 470, 
Jeffreys (1940) 471, Kolmogoroff (1929) 473, 
Pizzetti (1939) 487, Pollard (1934) 487, 
Rhodes (1927) 488, Romanovsky (1929) 
489, Simon (1943) 491, Truksa (iko) 495. 
See also Central Limit Theorem, Mean 
Values. 

— , test of difference, see Difference ; in multi- 
variate analysis, 338-41. 

A.S. — VOL. 11. 


Mean-square contingency, see Contingency. 

successive difference, Bibl. : Hart (1942)' 

467, von Neumann and others (1941a, 6) 
497, J. D. Williarns (1941) 500. 

Median, as e.stimator, 5 ; confidence intervals for, 
(Exercise 19.5) 84. Bibl. : Cisbani (1938) 

452, Doodson (1917) 458, Gini and Galvani 
(1929) 465, Gini (1938) 465, Gini and 
Zappa (1938) 465, Gulotta (193.8) 466, 
Haldane (19426) 467, Hojo (1931, 1933) 
469, Jackson (1921) 471, K. R. Nair (19406) 
479, K. Pearson (19316) 486, Pollard (1934) 
487, Savur (1937a) 490, W. R. Thompson 
(1936) 494, Ville (1936c) 496. 

Migration, see Random Migration. 

Minimum variance, of maximum likelihood esti- 
mators, 18—19 ; in estimation, 60-5. 

estimation, 56-8. 

Missing plot technique, 229-33. Bibl. : Allan 
and Wishart (1930) 443, Cornish (1940a, 6) 

453, K. R. Nair (1940a) 479, Yates (19336) 
501, Yates and Hale (19396) 502. 

Mode, Bibl. : Doodson (1917) 458, Haldane 

(19426) 467, K. Pearson (19026) 484, 
Yasukawa (1926) 501. 

Moment-function, Bibl., U. S. Nair (1939) 479. 
See Characteristic Functions, Generating 
Functions. 

Moments, efficiency of, 43-4. 

of distributions (specification), Bibl. : Corn- 
ish and Fisher (1937) 453, Fisher (1937a) 
462, R. Henderson (1907) 468, O’Toole 
(1933) 481, Pearl (1937) 482, K. Pearson 
(1936) 486, Romanovsky (19366) 489, von 
MLses (1937) 497. See. Curve Fitting. 

, problem of, Bibl. : Bodewadt (1936) 447, 

Broggi (1934) 449, Chlodovsky (1938) 451, 
Hamburger (1920, 1921) 467, Haussdorf 
(1923) 468, Haviland (1935, 1936) 468, 
Mareinkiewicz (1939) 477, Polya (1920, 
1938a) 487, hitekloff (1914) 492, Stieltjes 
(1918) 493, Widder (1934) 499. 

, sampling, Bibl. : Bernstein (1932) 446, 

C. C. Craig (1928) 453, (1940) 454, Dwyer 
(1937a, 1938, 1940) 458, Fisher (19296) 
461, Fisher and Wishart (1931) 462, Geary 
(1933) 464, Irwin and Kendall (1944) 470, 
Isserlis (19186, c, 1931) 470, St. Georgescu 
(1932) 493, Sukhatme (1938c, 1944) 494, 
Tschuprow (19186, 1921, 1923) 495, Wilka 
(1934, 1936) 499, Wishart (1929a, 6, 1930, 
1931a, 6, 19336) 500, Wishart and Bartlett 
(19326) 500, Ziaud-din (1938) 503. See 
also A; -statistics. 

Monotonic functions, in distribution theory, Bibl., 
Bochner (1937) 447. 

Mood, A. M., N.R., 304. 

Moore, G., phases in time-series, 126 ; N.JR., 136. 

L li 
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Morant, G., N.B., 394. 

Morgan, W. A., N.R., 137. 

Mortality, see Life. 

Most-efficient estimator, 6, 10, 18-19. 

Most-selective confidence intervals, 75, 82. 

Moths, effect of weather on, (Example 22.10) 
171-2. 

Moving averages, 372-87, 399. Bibl. ; ' Dodd 
(19390, 1941a, b) 457, Frisch (1938) 464, 
Wold (19386) 501. 

TOth values, Bibl.^ Gumbel (1934, 1935o, 1939) 
466. 

Multinomial distribution, Bibl., Kullback (1937) 
474, Lurqxxin (1937) 476. 

Multiple correlation, Bibl. • Bacon (1938) 444, 
R. a Bose (1934) 447, Fisher (19286) 461, 
Hall (1927a) 467, Kelley and McNemar 
(1929) 472, Kullback (1936c) 474, K. Pear- 
son and Lee (1908) 484, K. Pearson (1916^) 
485, K. Pearson and Young (1918) 485, 
Soper (1929a) 492, Starkey. (1939) 492, 
Tappan (1927) 494, Wilks (19326) 499, 
Wishart (19316) 500, Wong (1937) 501. 

curvilinear regression, 167, 236. See Re- 
gression. 

happenings, Bibl., Greenwood and Yule 

(1920) 466, K. Pearson (19126, 1913) 484. 
See Poisson Distribution, Polya Distribu- 
tion. 

Multivariate analysis, 328-62 ; Wishart’s distri- 
bution, 330—4 ; Hotelling’s distribution, 
335-8 ; significance of set of means, 338- 
41 ; discriminatory analysis, 341-8 ; 
canonical correlations, 348—58. 

Bibl. : Bartlett (19396, 1941) 445, Bishop 
(1939) 447, Fisher (1936a, 6, 1938c, 19396, 
1940fZ) 462, Hotelling (1933, 1936a, 6) 469, 
P. L. Hsu (19396, 1941a, c, d) 469, Madow 
(1937, 1938) 476, Mahalanobis (1930, 1936a) 

476, Mahalanobis and others (19366) 476, 
Martin (1936) 477, Rider (1936) 488, Roy 
(1938, 1939a, 6, 1942a, 6) 489, Simonsen 
(1937) 491, Wald and Brookner (19416) 
498. 

distributions, estimation in, 33-7 ; normal, 

see Normal. Bibl. : Leser (1942) 475, 

Lukomski (1939) 476, Mahlmann (1935) 

477. See also Multiple Correlation. 

Myers, R. J., N.R., 45. 

Nair, K. R., cbhfidence intervals for median, 81, 
N.R., 83. 

Nayer, P. N., testing hypotheses, 299 ; N.R., 304. 

Negative binomial, Bibl., Fisher (19416) 462, 
Greenwood and Yule (1920) 466. See Polyai 
Distribution. 

Neyman, J., confidence intervals, 75-6 ; Behrens- 
. . test-, 93 .; randomised blocks, 214 ; theory 


of tests, 270, 299, 308, 311, 323 ; Exercises 
from : (Exercises 19.2, 19.3) 83, (Exercise 
21.12) 140, (Exercises 26.2, 26.3) 304, 

(Exercises 26.4, 26.5) 305, (Exercise 27.3) 
327. N.R., 4:5, 83, 94, 136, 172, 266, 303, 

304, 326. 

Nisbet, S. D., (Example 25.1) 258-9. 

Non-central confidence intervals, 66. 

Z, Bibl., N. L. Johnson and Welch 

(1940a) 471. 

Non-normal data, in variance-analysis, 205-15. 

populations, Bibl. : Baker (1934) 444, 

Bartlett (1935a) 445, C. C. Craig (1941a) 
454, Geary (19366) 464, Laderman (1939) 
474, A. N. K. Nair (1942) 479, Pearson and 
Adyanthaya (1928, 1929) 482, E. S. Pearson 
(19316) 482, Rider (1931a) 487, Rietz (1932, 
1939) 488, Thorndike (1937) 494. 

Non-orthogonal data, Bibl. : K. R. Nair (1942) 
479, Wilks (1938e) 500, Yates (1934a) 501. 

Non-parametric tests, 322. Bibl., Scheff4 (1943) 
490. 

Non-random samples, Ri6Z., “Student” (1909) 
493. 

Nonsense correlations, Bibl., Yule (1926) 503. 

Normal equations, solution of, Bibl., Hoel (1941) 

468. 

population, estimation of mean, 2, (Example 

17.6) 11, (Example 17.7) 19-20, (Example 
18.1) 51 ; estimation of variance, (Example 
17.6) 11, (Example 18.4) 54-5 ; centi’o of 
location of, (Example 17.22) 42 ; confidence 
intervals for mean, (Example 19.1) 63-4, 
(Example 19.3) 70 ; fiducial distribution, 
85; bivariate, (Example 17.17) 3.3-4, 

(Example 17.18) 37—8 ; regre.s.sions of. 

(Example 22.1) 144. 

B'lbl. : Baker (1931) 444, B(^rgstroni 

(1918) 446, Cramer (1923, 1936) 454, Erdos 
and Kac (1939) 459, Haldane (1942a, 6) 
467, C. T. Hsu (1940, 1941) 469, Is.sorlis 
(19186) 470, Kac (1939) -472, Khintchine 
(1935) 473, Kullback (1935a) 474, Loder- 
mann (1939) 475, Lehmann (1939) 475, 
Lengyel (1939) 475, K. Pearson (1924c) 485, 
Polya (1923) 487, Raikov (1938) 487, 

Rhodes (1928) 488, Tricomi (1935, 1936a, 
19366) 495, Yule (19386) 503. 

Normalisation of frequency functions, Bibl. : 
Cornish and Fisher (1937) 453, Haldane 
(1938) 467, Mahalanobis and others (19366) 
476, Paulson (1942) 482. 

Normality, tests of, 105-6. Bibl. : Fisher (19306) 
461, Geary (1935a, 6, -1936a) 464, Geary 
and Pearson (1938) 464, E. S. Pearson 
(1930, 1935c) 482, Yasidiawa (1934) 501. 

Nuisance parameters, 134. Bibl., Hotelling (1940) 

469. 
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Olds, E. G., N.R., 266. 

Omega, for testing goodness of fit, 107—9. BibL, 
Smirnojff (1936) 491. 

One-sided confidence intervals, 76. 

Oppenheim, S., NM., 437. 

Order, in random series, 122-4, and see Random 
Order. 

Orthogonal data, in variance-analysis, 219, 254. 

— polynomials, 146-54, 159—67. BibL : Aitken 

(1932, 1933a, 6, c) 442, Allan (1930) 443, 
Dioulcfait (19346) 456, Fisher (19216, 19246) 
461, Greenleaf (1932) 465, Jackson (1934, 
1937, 1938) 471, Jordan (1932) 472, Lidstone 
(1933) 476, Romanovsky (1927) 489, San- 
sone (1933) 490, Shohat (1935) 491, C. ' D. 
Smith (1939) 491, Tartler (1935) 494, 

Tchebycheff (1907) 494, Webster (1938) 
498, Wishart (1933a) 500, Wong (1935) 501. 

transformations, BibL, Landahl (1938) 474, 

Ledermann (1938) 475. 

Oscillations, in time-series, 369, 370, 380, 397—8. 
See Periodicity. 

p-statistics, BibL, Roy (19396, 1942a) 489. See 
Multivariate Analysis, 
test, see Combination of Tests. 

Paired comparisons, BibL, Kendall and Babington 
Smith (1940) 472. 

Parameters, estimation of, see Estimation. 

— - of location and scale, 40—2. 

Partial (correlations, BibL : Isserlis (1914, 1916) 
470, Stouffer (1934) 493, Subraraanian 

(1935) 493. 

Pasteurised milk, in feeding, (Example 21.14) 133, 

Path coefficients, BibL, Engelhart (1936) 459, 

Wright (1934) 50 b 

Paulson, E. A., ^-distribution, 118 and N,R.^ 136, 

Peaks, in time -series, 124. 

Pearson distributions, moments in fitting, 43 4 ; 
sufficient estimators in (Exercise 17.18) 49. 
BibL : Ambarzumian (1937) 443, .Baker 
(1940) 444, Beal© (1937) 446, C. C. Craig 
(193H6) 454, Dioulefait (19356) 456, P'isher 
(1921a) 461, Hildebrandt (1931) 468, Irwin 
(1930) 470, K, Pearson (1894, 1895, 19016) 
483, (1916a) 484, (1924a) 485, Romanovsky 
(1924) 489, Wishart (1926) 500. See also 
Type I, etc. 

Pearson, E. S., confidence intervals for binomial, 
81 ; t in non-normal case, 103 ; test of 
normality, 106 ; z in non-normal case, 
205 ; (Exercise 23.4) 216-17 ; analysis of 
covariance, 238 ; (Exercises 26.2, 26.3, 26.4, 
26.6) 304-5 ; iV.JK., 45, 83, 136, 137, 245, 
266, 303, 304, 359. 

, K., (Example 21.14) 133 ; NM,, 45, 137, 

172, 173, 394. 


Peas, yields of, (Example 23.5) 200-2. 

Periodicity and periodogram analysis, 423-5, 
432-3, 433-5. BibL: Alter (1924, 1925, 
1926a, 6, 1933, 1937) 443, Beveridge (1921, 

1922) 446, Bradley and Crum, (1939) 449, 

Brownlee (19246) 449, Bruns (1921) 449, 
Brunt (1925, 1928) 449, Buys-Ballot (1847) 
450, J. I. Craig (1916) 454, Crum (1923, 
1925) 454, Dodd (1930) 456, (1939a, 6, 

1941a, 6) 457, Frisch (1928, 1931, 1933) 
463, Greenstein (1935) 465, Hersch (1934) 
468, Kalecki (1935) 472, Koopmans (1940) 
474, Kuznets (1929, 1933) 474, Larmor and 
Yamaga (1917) 475, Mitchell (1913) 478, 
Mitchell and Burns (1935) 478, Moor© (1914, 

1923) 478, Moulton (1938) 478, Oppenheim 
(1909) 481, Pietra (1925) 486, Poliak (1927) 
487, Poliak and Kaiser (1935) 487, Powell 
(1930) 487, Savur (1941) 490, Schuster 
(1898, 1899, 1906) 490, Soper (19296) 492, 
Starkey (1939) 492, Stumpff (1926, 1937) 
493, Tinbergen (1937, 1938) 495, Tintner 
(1935) 495, Trachtenberg (1921) 495, Vinci 
(1934) 496, Walker (1914, 1925, 1927, 1931) 
498, Wallis and Moore (1941) 498, Yule 
(1927a) 503. See also Harmonic Analysis, 
Time-series. 

Phases, in time-series, 124, 125—6. 

Pilot sampling, 252, N.R., 266. 

Pitman, E. J. G., tests of significance, 128-32, 
136; 2 :-tost, 211; tests of hypotheses, 
323-6 ; Exercises from, (Exercises 17.9, 
17.10, 17.11) 47, (Exercise 21,3) 138, 

(Exorcise 21.16) 140, (Exercise 27.2) 326, 
N.R., 45, 137, 216. 

Plant breeding, BibL, Y. Tang (1938) 494. 

Plot arrangements, BibL, Tedin (1931) 494. See 
Design. 

Poisson distribution, (Example 17.9) 21—2 ; con- 
fidence intervals for, (Example 19.4) 70-1, 
81 ; conditional tost for, (Example 21.12) 
127 ; in variance-analysis, 206-7. 

BibL : Ackermann (1939) 442, R. A. 
Chapman (1938) 451, Cochran (1936a, 

19406) 452, Cox:>ola.nd and Regan (1936) 453, 
Doetsch (1934) 457, Fisher and othci's 
(1922c) 461, Garwood (1936) 464, Irwin 
(1935, 1937a) 470, L6vy (1937a) 475, Lliders 
(1934) 476, Molina (1942) 478, Poisson (1837) 
487, Przyborowski and Wikhiski (1940) 487, 
Raikov (1936) 487, Ricker (1937) 488, 
Satterthwaite (1943) 490, “ Student ” (1907, 
1919) 493, Sukhatm© (19376, 1938a) 494, 
von Bortkiewicz (1898, 1910) 496, Weida 
(1935) 498, Whitaker (1914) 499. 

Poisson’s theorem in probability, BibL, Bochner 
(1936) 447, Bonferroni (1933) 447. See 
Central Limit Theorem. 
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Polya distribution, Bibl., del Chiaro (1936) 456, 
S. Guldberg (1935) 466. See Negative 
Binomial. 

Polyehoric correlations, Bibl., Pearson and Pearson 
(19226) 485, Ritchie-Scott (1918) 489. 

Polynomials, expansions in, Bibl., Cacciopolli 
(1932) 450, Davis (1933) 455. See Ortho- 
gonal Polynomials, Curve Pitting. 

Population of England and Wales, (Example 
22.7) 161-3, (Examples 22.8. 22.9) 164-7, 
(Table 29.2, Figure 29.2) 365. 

— analysis, Bibl. : Lotka (1938, 1939) 476, 

Pearl and Reed (1923) 482, Volterra (1936) 
496. 

Potato yields, (Example 21.11) 126. 

Power of a test, 272, 307-8. Bibl, ; G. W. Brown 
(1939) 449, Dantzig (1940) 455, Eisenhart 
(1938) 459, MacStewart (1941) 476, Simaika 
(1941) 491, P. L. Hsu (19416) 469, P. C. 
Tang (1938) 494. See also Statistical 

Hypotheses. 

Powers of normal variates, Bibl., Haldane (1942a) 
467. 

Prediction, see Forecasting. 

Pretorius, S. J., N.JR., 173. 

Principal components, Bibl. : Girshik (1936) 465, 
Hotelling (1933, 1936a) 469, Landahl (1938) 
474, Ledermann (1938) 475, Thurstone 

(1935) 495. 

Probability, Bibl. : Bartlett (19336) 445, Beck 
(1936) 446, Belardinelli (1934) 446, Borel 
(1939) 447, Broderick (1937) 449, Cantelli 
(1932, 19336) 450, Castelnuovo (1932) 451, 
Cramer (1937, 1938, 1939) 454, de Finetti 
(1933a, 6, 1939a) 456, Doeblin (1938) 457, 
Doob (19346, 1941) 457, Eggenbeuger (1924) 
459, Erd^lyi (1937) 459, Khintchine (19376) 
473, Kolmogoroff (1931, 1933a) 473, L6vy 
(1931a, 1931c, 1936a, 1937a, 1938a) 475, 
Lomnicki (1923) 476, Marchand (1937) 477, 
McKinsey (1939) 477, MoLsseiev (1937) 478, 
Nagel (1936) 479, Reichonbach (1937) 488, 
Rice (1938) 488, Romanovsky (1931a) 489, 
Tornier (1929, 1930, 1936, l'937) 495, von 
Mises (1919a, 6, 1928, 1931, 1936a, 6, 1939c, 
1941) 497, Urban (1918) 496, Uspensky 
(1937) 496. 

Probits, Bibl, Bliss (1935, 1937) 447. 

Product, distribution of, Bibl., C. C. Craig (1936a) 
454. 

Product-moment correlation, see Correlation. 

Proficiency test of recruits, (Example 24.7) 240-2. 

Proportionate frequencies, in variate-analysis, 228. 

Proportions, tests of, Bibl, Swaroop (1938) 494. 

Quadratic forms, see Independence of Quadratic 
Forms. 

Quality control, Bibl. : Becker and others (1930) 


446, Jennett and Welch (1939) 471, E. S. 
Pearson (1933a, 1934) 482, Shewbart (1931) 
491, Simon (1941) 491, Welch (19366) 498, 
Wilks (1941) 500, Wolfowitz (1943) 501. 

Quartiles, Bibl, Hojo (1931, 1933) 469. 

Quasi-Latin squares, Bibl, Yates (1937a) 502. 

Quasi-sufficiency, Bibl., Bartlett (1940) 445. See 
Conditional Statistics. 

Racial likeness, N.B., 358. Bibl, Morant (1939) 
478, K. Pearson (19266) 485. See Mxdti- 
variate Analysis. 

RainfaU in London,' (Table 29.4, Figure 29.4) 367. 

Random component in time-series, 369 ; effect of 
trend-elimination on, 378-87 ; tests for, 
399. 

migration, Bibl., Brownlee (1911) 449. 

occurrences, Bibl., Morant (1921) 478. 

order, tests of, 122—7. Bibl. : (runs, etc.) 

Andr4 (1884) 444, Besson (1920) 446, Borol 
(1933) 447, Denk (1936) 456, Fi.sher (19266) 
461, Gumbel (1943a) 466, Jones (1937c) 
472, Kaucky (1936) 472, Mood (1940) 478, 
von Bortkiewicz (1915a, 1917) 496, von 
Mises (1921) 497, Wolfowitz (1943) 501. 

paths, Bibl, MeCrea (1936) 477, Polya 

(19386) 487. 

— — samples, tables of, Bibl., Malmlauobis and 
others (1934) 476. 

■ sampling nurabors, Bibl. : Kendall and 

Babington Smith (1939a) 472, K. R. Nair 
(1938a) 479, Yule (1938a) 503. 

sequence, Bibl. : Copeland (1928, 1929, 

1932, 1936, 1937) 453, Dorge (1934, 1936) 
458, Grevilla (1939) 466, ' Regan (1936, 
1938) 487, Rice (1939) 488, Svved and 
Eisenhart (1943) 494, Villo (1936a, 6) 496, 
von Mises (1931, 1933) 497, Wald (19366, 

1937) 497, Young (1941) 502. 

variables, j5i6^. .• Cramer ( 1 935a) 4.54, Cramer 

and others (1938) 4.54, de Eimd.ti (1929) 
455, Eyraud (19386) 459, L<‘vy (1934, 
193,5a, 6, 1936c, 1939a, 6) 475. See Proba- 
bility. 

Randomisation, and z-test, 209 13, 255 (i ; in 

design, 263-6. Bibl., ]{!. 8. Peai'son (19376, 

1938) 483 ; and see. Design. 

Randomised blocks, 213—14. Bibl. : ( 'ornish 

(1940a) 453, McCarthy (1939) 477, Wdeh 
(1937) 498. See Blocks. 

Randomness, Bibl. : l3orel (1937) 447, Dodd 

(1942) 457, Kendall (1941) 472, K(M-mack 
and McKondrick (1936, 1937) 473, Wiener 
(1938) 499. 

Range, test of, (Exercise 27.3) 327. Bibl. : Geary 
(1943) 464, Hartley (1942) 467, McKay and 
Pearson (1933) 477, Newman (1939) 480, 
Olds (1935) 481, E. S. Pearson (1926, 1932) 



INDEX 


517 


482, Pearson and Haines (1936a) 482, 
Pearson and Hartley (1942, 1943) 483, 
Romanovsky (19336) 489, W. R. Thompson 
(1938) 494, 'Tippett (1925) 495. 

Rank correlation, 123, 441. Bihl. : Daniels (1944) 
456, Dantzig (1939) 455, Dubois (1939) 458, 
Hotelliirg and Pabst (1936c) 469, Kendall 
(19386, 1942a) 472, Kendall and others 
(1939, 19396) 472, Olds (19386) 481, K. 
Peanson (1914, 1921) 484, Pearson and 
Pearson (1931c, 1932) 486, “Student” 

(1921) 493, Wallis (1939) 498, Watkins 
(1933) 498, Woodbury (1940) 501. 

Ratio, distribution of, Bihl. : C. C. Craig (19296) 
453, Curtiss (1941) 464, Fieller (19326) 460, 
Geary (1930) 464, Gordon (1941) 466, 

Hirschfeld (1937) 468, Kullback (1936a) 
474, Nicholson (1941) 481, van Uven (1932, 
1939) 496. 

Rectangular distribution, estimatioix of extremes, 
(Example 17.15) 28 ; intrinsic accuracy, 
(Example 17.11) 47 ; estimation by sample- 
centre, (Exercise 17.16) 48 ; confidence 
intervals for range, (Exercise 19.1) 83. 
Bibl. : O. L. Davies (1932) 455, Dunlap 
(1931) 458, Hall (19276) 467, Olds (1935) 
481, Riotz (1931a) 488. 

Region of acceptance, 63, 76, 270. 

Regression, Gauss’ theorem on residuals, 60—1 ; 
generally, 141--74 ; analytical theory, 
141-5 ; fitting of curvilinear regi’ossions, 
145 -53 ; standard errors and tests of sig- 
nificance, 153-8 ; equal .steps of variate, 
159-67 ; multiple curvilinear, 167 ; addi- 
tion of new variates, 167-72 ; in analysis 
of variance, 233-6 ; relation with Hotelling’s 
T, 336- 7 ; in di.scriminatory analysis, 344—6. 

Bihl. : R. G. D. Allen ('l939) 443, H. V. 
Allen (1938) 443, Andorsson (1932) 443, 
(1934) 444, Bartlett (1933a, 1938c) 445, F. 
Bernstein (1937) 446, Blakeman (1905) 447, 
S. S. Bose (19.34a., 6, 19386) 448, Camp 
(19256) 450, Cochran (1938a) 4.52, Dodd 
(19376, c) 457, Dwyer (19376, 1941c) 458, 
Eisenhart (1939) 459, Ezekiel (19306) 460, 
Fisher (19226) 461, Galton (1886) 464, 
.Jones (19376) 472, Koopmans (1937) 474, 
Mendenshausen (1937a) 477, T. V. Moore 
(1937) 478, Neyman (1926) 480, K. Pearson 
(1896) 483, (1921, 1926a) 485, Quensel 
(1936) 487, Richards (1931) 488, Roman- 
ovsky (1926, 19316) 489, Slutzky (1914) 
491 , 'k. Smith (1918) 492, Waugh (1942) 

498, Welch (1935) 498, Wicksell (19346) 

499, Yates (1939d) 502, Yule (1936) 503. 

coefficients, standard error of, 153-6 ; exact 

tests of, 156-8. 

Regular unbiassed critical regions, 318-19. 


Rejection of observations, Bihl. : Irwin (19266) 
470, Pearson and Chandra Sekhar (1930) 
483, Rider (1933) 488, W. R. Thompson 
(1935) 494. 

Relaxed oscillations, Bihl., Le Corbeiller (1933) 
476, van der Pol (1930) 496. 

Reliability coefficients, Bihl., Stoxxffer (19366) 493. 

Replication, 265. Bihl. ; Bartlett (1938a) 446, 
Cochran (19376, 19386, 1939a) 462, Yates 
(1933a, 6) 600, (1936d) 501. See Design. 

Representative method of sampling, Bihl. : A. T. 
Craig (1939) 463, Jensen (1925) 471, Ney- 
man (193.36, 1934) 480, Sukhatme (1935) 
493. 

Residual, in variance-analysis, 178, 186-7. 

Ricker, W. E., confidence intervals for Poisson 
distribution, 81. 

Riemann zeta-function, Bihl., Jessen and Wintner 
(1936) 471. 

Risk, theory of, Bihl., Cramdr (1923) 454, Essoher 
(1932) 459. 

Robinson, G., N.R., 394, 437. 

Roots of equations, distribution of, Bihl., Girshik 
(1939, 1942) 465. 

Routine analysis, Bihl. : Neyman (19396, 19416) 
480, Przyborow.ski and Wildnski (19366) 
487, “ Student ” (1927) 493. 

Roy, S. N., distribxxtion of canonical correlations, 
357 and N.E., 359. 

Runs, in time-series, see Random Order. 

Sampling distributions, moments of, see ^-statistics, 
Moments. 

inquiries, see Design. 

, miscellaneous, Bibl. : Bartky (1943) 446, 

Bartlett (19376) 446, Baten (19336) 446, 
Bowley (1925) 448, Burks (1933) 460, Clap- 
ham (1931, 1936) 4,52, Cochran (19366, 
19396, 19426) 452, A. T. Craig (1933a, 6) 

453, C. C. Craig (1931a) 453, Crum (1933) 

454, David (19386) 456, Hey (1938) 468, 
Hilton (1924, 1928) 468, Kiser (1934) 473, 
McKay (1934) 477, Neyman (1933a, 1934, 
1938a) 480, Olds (1939, 1940) 481, Panse 
(1939) 482, E. S. Pearson (1933a, 1934) 
482, Pepper (1929) 486, Rhodes (1925) 488, 
Rider (19316) 488, Rietz (1937) 488, Shew- 
hart and Winters (1928) 491, “ Sophister ” 
(1928) 492. 

surveys, Bihl., A. N. Bose (1941) 447, C. 

Bose (1943) 447; and see Sampling, miscel- 
laneous. 

Sasuly, M., N.B., 394. 

Savur, S. R., N.M., 83. 

Scale, estimation of parameters of, 40-2 ; elimina- 
tion of parameters of, 79—80 ; Pitman’s 
tests of, 323-6. Bihl, Pitman (1939o, 6) 
486. 
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Scale, reading, Bihl., Yule (19276) 503. 

Scales of measxrrement, Bihl., Cochran (1943) 452. 

Scatterance, N.B., 358. 

Scedastic curve, 142. 

Seheffe, H., non-parametric tests, 322 ; N.R., 

304, 326. 

Schoolchildren, tests of, (Example 25.1) 258-9, 
(Example 28.4) 351—2. 

Schultz, H., N.R., 394. 

Schuster, Sir Arthur, significance of periodogram, 
434; N.R., 437. 

Seasonal effect, in time-series, 369. Bibl. : Bow- 
ley and Smith (1924) 448, Carmichael (1931) 
451, Carver (1932) 451, Crum (1925) 454, 
Detroit Edison Co. (1930) 456, Donner 
(1928) 467, Falkner (1924) 460, Cre-ssens 
(1925) 466, Mendei’shausen (19376) 478, 
Bobb (1929, 1930) 489, Wald (1936a) 497, 
Wisniewski (1934) 501, Zrzavy (1933) 503. 

Second Limit Theorem, Bihl., Pr4ehet and Shohat 
(1931) 463. 

moment, see Variance. 

Seed in optical glass, (Example 23.6) 202-5. 

Seeds of wheat, germination of, (Example 23.7) 
207-9. 

Selective confidence intervals, 75-6. 

Semi-normal distribution, Bihl., Steffensen (1937) 
492. 

Seminvariants, see Cumulants, fc-statistics. 

Sensitivity, of tests of .significance, 256. 

Serial correlation, 402-4. See Correlogram. Bibl. : 
R. L. Anderson (1942) 443, Bartlett (1935c) 
445, Dixon (1944) 456, Kendall (1944a, 6) 
473, Koopmans (1942) 474, Marples (1932) 
477, Schumann and Hofraoyer (1942) 490, 
Yule (1921) 502, (1926, 1927a) 503. 

Sheep population of England and Wales, (Table 
29.3, Figure 29.3) 366, (Example 29.5) 
385—6, (Example 30.5) 411, (Example 30.8) 
416-18. 

Sheppard’s corrections, see Grouping Corrections. 

Shortest confidence intervals, 71-5, 75-6. 

Significance tests, 96-140, 269-327. See Statistical 
Hypotheses. Bihl., Jeffreys (1938a) 471, 
PeLser (1943) 486. 

Silverstone, H., minimum variance, 61 ; (Exor- 
cises 18.1, 18.2) 61. 

Simaika, J., N.R., 304, 359. 

Similar regions, 283. Bihl., Feller (1938) 460. 

Simon, L. E., N.R., 61. 

Simple hypotheses, 269, 272-82, 317-26. 

Simultaneous estimation, of several parameters, 
34-44. 

fiducial distributions, Bihl., Bartlett (1939a) 

445. 

Sinusoidal limit, N.B., 394. Bihl. : Marsueguerra 
(1936) 477, Romanovsky (1931c, 1932a, 
1933a) 489, Slutzky (19376) 491. 


SkQwness, Bibl., Frisch (1934a) 464, Garner (1932) 
464. 

Skulls (Egyptian), (Example 28.3) 345—8. "" 

Slutzky, E., N.R., 394, 399. 

Slutzky-Yule effect, 378-87, 399. Bibl., Slutzky 
(19376) 491, Yule (1921) 502. 

Small numbers, law of, see Poisson Distribution. 

Smirnoff, N., cc^-test, 109. 

Smith, H. Fairfield, N.R., 359. 

— — , K., minimum-x^, 55 and N.R., 61. 

Smoothing, see Moving Averages, Trend. 

Soil, loss of weight in, (Example 22.3) 149-52, 
(Example 22.6) 158. 

Solomon, L., footnote, 51. 

Spearman, C., (Exercise 25.3) 267. 

Spearman’s factor theory, see Factor Analysis. 

p, test of, 132. 

Speed tests in children, (Example 28.4) 351-2. 

Spelling ability in children (Example 25.1) 258-9. 

Spencer’s formula in curve fitting, (Examples 29.2, 
29.3) 376-7, 378-80, (Exercise 29.3) 394-5, 
(Example 30.2) 405. 

Spurious correlation, Bibl. : K. Pearson (18976) 
483, Spearman (1907, 1910) 492, Wicksell 
(1921) 499. 

Square of a variate, Bibl., Haldami (1941) 467. 

Squariance, footnote 178. 

Stabilising of vai'iance, 207. 

Stability of series, see Lexis Theory. 

Stable laws of probability, Bibl. : Bochner (1937) 
447, Feldheim (19.Wa) 460, Khintchin(» and 
Levy (1936) 473, Khintchim! (1938) 473. 

Standard dx'viation, estimation of, (PlxarnpU^ 17.5) 
6-7, (Example 17.6) 11, 52. See Variance. 

errors, in testing significance, 97 8 ; of 

regression coeflicients, 153-6. Bihl. : Dei'k- 
son (1939) 456, Edgeworth (1908, 1909) 
459, Eels (1929) 459, Hendricks (1934) 468, 
Lsserlis (191.5, 1916) 470, Miller (1934) 478, 
K. Pearson (1903, 1913, 1920) 484, (1924(/) 
485, K. Pearson and Leo (1908) 484, K. 
Pearson and Filon (1898) 483. 

■ - Latin s(juar('s, 2.59. 

Stationary tirac-soritw, 396. Bibl. : Kliintchino 
(1932, 1933, 1934) 473, Slutzky (1934) 491, 
Wold (1938a, 1939) .501. See Tim<!-.s(M-ios, 
Correlogram. 

Statistical hypotheses, definition, 269 ; (vn-oi's of 
first and second kind, 270-2 ; power 
function, 272 ; simple hypotlui.ses, 272-5 ; 
best critical regions, 277-80 ; relation with 
sufficient estimators, 281-2; cornpe^sito 
hypotheses, 282—3 ; similar regions, 283- 7 ; 
of several degrees of freedom, 287 ; linear 
hypotheses, 292-5 ; likelihood criteria, 
295 ; h samples, 295-302 ; bias, 307-26 ; 
regions of Type A, 309-14, of Type Aj, 
314-16, of Type B, 316-17, of Type C, 
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317—22 ; limiting properties, 322 ; Pitman’s 
tests, 323-6. 

Bibl. : G. W. Brown (1940) 449, Chandra 
Sekhar and Francis (1941) 451, Daly (1940) 
454, Dantzig (1940) 455, Gumbel (1942) 
466, R. W. Jackson (1936) 471, Kolod- 
zioczyk (1933, 1935) 474, Noyman (19356, 
19386) 480, (1942) 481, FTeyman and Pear- 
son (1928, 1931a, 1933a, c, 1936a, 1938) 
480, E. S. Pearson (1941, 1942a) 483, 
Pitman (19396) 486, Riotz (1938) 488, 
Scheffe (1942a, 1943) 490, Wald (1939a) 
497, (1941a) 498, Wilks (1935c, 1938a) 499, 
Wolfowitz (1942) 501. 

Statistical Retn'ew of England and Wales, data from, 
(Example 21.8) 120, (Example 21.9) 121. 

Stevens, W. L., test of significance in periodogram, 
434; N.R., 216. 

Stieltjes integrals, Bibl., Sholiat (1930) 491. 

Stochastic convergence, 440. See Convergence in 
Probability. 

■ dependence, see Independence. 

processes, Bibl., Doob (1934a, 1937, 1938) 

457, Feller (1936a) 460. See Probability. 

Stock forecasting, Bibl., Cowles (1933) 453, Cowles 
and Jones (1937) 453. 

Stock, J. S., N.R., 266. 

Stratified sainpling, 249—52. Bibl. : P. H. Ander- 
son (1942) 443, Baker (1930c) 444, G. M. 
Brown (1933) 449, Frankel and Stock (1939) 
463, McKay (1934) 477, Mood (1943) 478. 
See also Sampling, miscellaneous. Repre- 
sentative Method. 

“ Student ” (W. S. Cosset), see Gosset. 

Studentisation, 79—81, 134. Bibl., Hartley (1938, 
1944) 467, Newman (1939) 480. 

“ Student’s ” distribution, confidence intervals 
based on, 79-80 ; fiducial infennico i>asod 
on, 88 ; properties of, 100-2 ; in hisl.ing 
mean, 98-100 ; in non-normal case, 102 -4 ; 
other uses, 104 ; in testing two mt^ans, 
109 10, 113-14; in te.sting Spearman’s p, 
124 ; in Pitman’s tests, 131, 132 ; in testing 
rcign^ssions, 1.56, 1.58, 172 ; in analysis of 
covariance, 244 ; (Example 26.9) 291. 

Bibl. : Bartlett (1935a) 445, C. C. Craig 
(1941a) 454, Daniels (1938a) 454, Fisher 
(1926a) 461, Geary (19366) 464, Hendricks 
(1936) 468, P. L.'Hsu (1938a) 469, N. L. 
tjolinson and Welch (1940a) 471, Kerrich 
(1937) 473, Kolodzieczyk (1933) 474, Lader- 
mann (1939) 474, McKay and others (1932) 
477, Merrington (1942) 478, A. N. K. Nair 
(1942) 479, Perlo (1933) 486, Rider (1929) 
488, Rietz (1939)488, Steffensen (1936) 492, 
“ Student ” (1908a, 1931a) 493, Treloar and 
Wilder (1934) 495. 

hypothesis, 285-7. Bibl., Neyman and 


Tokarska (19366) 480, Przyborowski and 
Wil5nski (1935a) 487. 

Stumpff, K., N.R., 437. 

Sufficient estimators, 7-12 ; given by maximum 
likelihood, 19 ; general form possessing, 
24—5 ; distribution of, 25 ; when range 
depends on parameter, 27—8 ; for several 
parameters, 39-40 ; giving minimum- 
variance estimators, 52 ; relation with 
confidence intervals, 74—5, 79 ; relation 
with XJ.M.P. tests, 281-2, with U.M.P.U. 
teste, 310. 

Bibl. : Bartlett (19366, i937c, 1940) 445, 
Darmois (1935) 456, Koopman (1936) 474, 
Neyman (1935a) 480, Neyman and Pearson 
(1936a) 480, Pitman (1936) 486, Welch 
(1939a) 498. 

Sukhatme, P. V., tables for Behrens’ test, 92, 111 ; 
(Exercise 26.8) 306-6 ; sampling moments, 
440. N.B., 94, 266, 304. 

Sum, distribution of, see Means. 

Summation convention, 329. 

Sunspots, Bibl., Schuster (1906) 490, Yule (1927a) 
603. 

Symmetric functions, Bibl., O’Toole (1931, 1932) 
481. See Moments, ifc-statistics. 

T-distribution, see Hotelling’s T. 

Tabular differences, Bibl., Ladtu’inann and Lowan 
(1939) 474. 

Tanburn, E., N.R., 137. 

Tang, P. C., linear hypotheses, 301 ; N.R., 303. 

Tchebychoff, P. L., (Exercise 22.4) 173 ; N.R., 172. 

Tohebycheff-Hermite polynomials, Bibl. : Doetsch 
(1934) 457, Erd61yi (1938) 469, Feldheim 
(19376) 460. See Gram-Charlier Series, 

Orthogonal Polynomials. 

Tchobycheff’s inequality, Bibl. : Bergo (1938) 
'446, Bivrnstoin (1937) 446, Camp (1922) 460, 
C. C. Craig (1933) 454, K. Pearson (1919) 
485, C. D. Smith (1930) 491. 

Tea-drinking, Bibl., Mahalanobis (1943) 476. 

Telephone service, Bibl., Newland and Neal (1939) 
479, Palm (1937) 482. 

Terminals of frequency-distribution, confidence 
intervals for, 83. 

Test construction, Bibl., Curoton and Dunlaj) 
(1938) 454. 

Tests of .significance, see Significance, Statistical 
Hypotheses. 

Tetrachoric functions, Bibl. : J. Henderson (1922) 
468, K. Pearson (1912a, 1913a, 6) 484, K. 
Pearson and Heron (1913c) 484, Now bold 
(1925) 479, Pearson and Pearson (19226) 
485. 

Tetrad difference, (Exercise 28,10) 362. Bibl., 
Hotelling (19366) 469, Wilks (1932d) 499. 
See Factor Analysis. 
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Third moment, distribution of, Bihl., Pepper 
(1932) 486. 

Thompson, C., on ;i-tests, 299; N.R., 303. 

Thompson, W. R., (Exercise 19.6) 84; N.B., 83. 

Thomson, G., (Example 25.1) 258-9. 

Ties in ranking, 127, 441. 

Time-series, 363-439 ; examples of, 363-9 ; trend, 
371—8 ; effect of trend elimination, 378-87 ; 
variate difference method, 387-94 ; oscilla- 
tions, 397—9 ; tests for randomness, 399 ; 
types of oscillatory series, 395-402 ; serial 
correlations, 402-4 ; correlogram, 404-13 ; 
autoregressive schemes, 414-21 ; auto- 
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